1
|
Nóbrega T, Pires CES, Nascimento DC. Blockchain-based Privacy-Preserving Record Linkage: enhancing data privacy in an untrusted environment. INFORM SYST 2021. [DOI: 10.1016/j.is.2021.101826] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 10/21/2022]
|
2
|
Durham EA, Kantarcioglu M, Xue Y, Toth C, Kuzu M, Malin B. Composite Bloom Filters for Secure Record Linkage. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2014; 26:2956-2968. [PMID: 25530689 PMCID: PMC4269299 DOI: 10.1109/tkde.2013.91] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Academic Contribution Register] [Indexed: 06/03/2023]
Abstract
The process of record linkage seeks to integrate instances that correspond to the same entity. Record linkage has traditionally been performed through the comparison of identifying field values (e.g., Surname), however, when databases are maintained by disparate organizations, the disclosure of such information can breach the privacy of the corresponding individuals. Various private record linkage (PRL) methods have been developed to obscure such identifiers, but they vary widely in their ability to balance competing goals of accuracy, efficiency and security. The tokenization and hashing of field values into Bloom filters (BF) enables greater linkage accuracy and efficiency than other PRL methods, but the encodings may be compromised through frequency-based cryptanalysis. Our objective is to adapt a BF encoding technique to mitigate such attacks with minimal sacrifices in accuracy and efficiency. To accomplish these goals, we introduce a statistically-informed method to generate BF encodings that integrate bits from multiple fields, the frequencies of which are provably associated with a minimum number of fields. Our method enables a user-specified tradeoff between security and accuracy. We compare our encoding method with other techniques using a public dataset of voter registration records and demonstrate that the increases in security come with only minor losses to accuracy.
Collapse
Affiliation(s)
| | - Murat Kantarcioglu
- Department of Computer Science, University of Texas at Dallas, Richardson, TX, 75083.
| | - Yuan Xue
- Dept. of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN 37232.
| | - Csaba Toth
- Dept. of Biomedical Informatics, Vanderbilt University, Nashville, TN 37232.
| | - Mehmet Kuzu
- Dept. of Computer Science, University of Texas at Dallas, Richardson, TX, 75083.
| | - Bradley Malin
- Depts. of Biomedical Informatics and EECS, Vander bilt University, Nashville, TN 37232.
| |
Collapse
|
3
|
Abstract
Data integration occurs when a query proceeds through multiple data sets, thereby relating diverse data extracted from different data sources. Data integration is particularly important to biomedical researchers since data obtained from experiments on human tissue specimens have little applied value unless they can be combined with medical data (i.e., pathologic and clinical information). In the past, research data were correlated with medical data by manually retrieving, reading, assembling and abstracting patient charts, pathology reports, radiology reports and the results of special tests and procedures. Manual annotation of research data is impractical when experiments involve hundreds or thousands of tissue specimens resulting in large, complex data collections. The purpose of this paper is to review how XML (eXtensible Markup Language) provides the fundamental tools that support biomedical data integration. The article also discusses some of the most important challenges that block the widespread availability of annotated biomedical data sets.
Collapse
Affiliation(s)
- Jules J Berman
- Pathology Informatics, Cancer Diagnosis Program, National Cancer Institute, Rockville, MD 20892, USA.
| | | |
Collapse
|
4
|
Durham E, Xue Y, Kantarcioglu M, Malin B. Quantifying the Correctness, Computational Complexity, and Security of Privacy-Preserving String Comparators for Record Linkage. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2012; 13:245-259. [PMID: 22904698 PMCID: PMC3418825 DOI: 10.1016/j.inffus.2011.04.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Academic Contribution Register] [Indexed: 06/01/2023]
Abstract
Record linkage is the task of identifying records from disparate data sources that refer to the same entity. It is an integral component of data processing in distributed settings, where the integration of information from multiple sources can prevent duplication and enrich overall data quality, thus enabling more detailed and correct analysis. Privacy-preserving record linkage (PPRL) is a variant of the task in which data owners wish to perform linkage without revealing identifiers associated with the records. This task is desirable in various domains, including healthcare, where it may not be possible to reveal patient identity due to confidentiality requirements, and in business, where it could be disadvantageous to divulge customers' identities. To perform PPRL, it is necessary to apply string comparators that function in the privacy-preserving space. A number of privacy-preserving string comparators (PPSCs) have been proposed, but little research has compared them in the context of a real record linkage application. This paper performs a principled and comprehensive evaluation of six PPSCs in terms of three key properties: 1) correctness of record linkage predictions, 2) computational complexity, and 3) security. We utilize a real publicly-available dataset, derived from the North Carolina voter registration database, to evaluate the tradeoffs between the aforementioned properties. Among our results, we find that PPSCs that partition, encode, and compare strings yield highly accurate record linkage results. However, as a tradeoff, we observe that such PPSCs are less secure than those that map and compare strings in a reduced dimensional space.
Collapse
Affiliation(s)
- Elizabeth Durham
- Department of Biomedical Informatics, Vanderbilt University, 2525 West End Avenue, Nashville, TN 37203, USA
| | - Yuan Xue
- Department of Electrical Engineering & Computer Science, Vanderbilt University, 400 24th Avenue South, Nashville, TN 37212, USA
| | - Murat Kantarcioglu
- Department of Computer Science, University of Texas at Dallas, 2601 North Floyd Road, Richardson, TX 75083, USA
| | - Bradley Malin
- Department of Biomedical Informatics, Vanderbilt University, 2525 West End Avenue, Nashville, TN 37203, USA
- Department of Electrical Engineering & Computer Science, Vanderbilt University, 400 24th Avenue South, Nashville, TN 37212, USA
| |
Collapse
|
5
|
El Emam K, Samet S, Hu J, Peyton L, Earle C, Jayaraman GC, Wong T, Kantarcioglu M, Dankar F, Essex A. A Protocol for the secure linking of registries for HPV surveillance. PLoS One 2012; 7:e39915. [PMID: 22768321 PMCID: PMC3388071 DOI: 10.1371/journal.pone.0039915] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 07/29/2011] [Accepted: 06/04/2012] [Indexed: 11/19/2022] Open
Abstract
INTRODUCTION In order to monitor the effectiveness of HPV vaccination in Canada the linkage of multiple data registries may be required. These registries may not always be managed by the same organization and, furthermore, privacy legislation or practices may restrict any data linkages of records that can actually be done among registries. The objective of this study was to develop a secure protocol for linking data from different registries and to allow on-going monitoring of HPV vaccine effectiveness. METHODS A secure linking protocol, using commutative hash functions and secure multi-party computation techniques was developed. This protocol allows for the exact matching of records among registries and the computation of statistics on the linked data while meeting five practical requirements to ensure patient confidentiality and privacy. The statistics considered were: odds ratio and its confidence interval, chi-square test, and relative risk and its confidence interval. Additional statistics on contingency tables, such as other measures of association, can be added using the same principles presented. The computation time performance of this protocol was evaluated. RESULTS The protocol has acceptable computation time and scales linearly with the size of the data set and the size of the contingency table. The worse case computation time for up to 100,000 patients returned by each query and a 16 cell contingency table is less than 4 hours for basic statistics, and the best case is under 3 hours. DISCUSSION A computationally practical protocol for the secure linking of data from multiple registries has been demonstrated in the context of HPV vaccine initiative impact assessment. The basic protocol can be generalized to the surveillance of other conditions, diseases, or vaccination programs.
Collapse
Affiliation(s)
- Khaled El Emam
- Electronic Health Information Laboratory, Children's Hospital of Eastern Ontario Research Institute, Ottawa, Ontario, Canada.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Malin B. Secure construction of k-unlinkable patient records from distributed providers. Artif Intell Med 2010; 48:29-41. [PMID: 19875273 DOI: 10.1016/j.artmed.2009.09.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 11/25/2008] [Revised: 06/08/2009] [Accepted: 09/12/2009] [Indexed: 11/29/2022]
Affiliation(s)
- Bradley Malin
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37203, USA.
| |
Collapse
|
7
|
|
8
|
Schnell R, Bachteler T, Reiher J. Privacy-preserving record linkage using Bloom filters. BMC Med Inform Decis Mak 2009; 9:41. [PMID: 19706187 PMCID: PMC2753305 DOI: 10.1186/1472-6947-9-41] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 04/19/2009] [Accepted: 08/25/2009] [Indexed: 11/10/2022] Open
Abstract
Background Combining multiple databases with disjunctive or additional information on the same person is occurring increasingly throughout research. If unique identification numbers for these individuals are not available, probabilistic record linkage is used for the identification of matching record pairs. In many applications, identifiers have to be encrypted due to privacy concerns. Methods A new protocol for privacy-preserving record linkage with encrypted identifiers allowing for errors in identifiers has been developed. The protocol is based on Bloom filters on q-grams of identifiers. Results Tests on simulated and actual databases yield linkage results comparable to non-encrypted identifiers and superior to results from phonetic encodings. Conclusion We proposed a protocol for privacy-preserving record linkage with encrypted identifiers allowing for errors in identifiers. Since the protocol can be easily enhanced and has a low computational burden, the protocol might be useful for many applications requiring privacy-preserving record linkage.
Collapse
Affiliation(s)
- Rainer Schnell
- Methodology Research Unit, Department of Social Sciences, University of Duisburg-Essen, D-47057 Duisburg, Germany.
| | | | | |
Collapse
|
9
|
Kantarcioglu M, Jiang W, Malin B. A Privacy-Preserving Framework for Integrating Person-Specific Databases. PRIVACY IN STATISTICAL DATABASES 2008. [DOI: 10.1007/978-3-540-87471-3_25] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Academic Contribution Register] [Indexed: 01/11/2023]
|
10
|
Abstract
The widespread application of tissue microarrays in cancer research and the clinical pathology laboratory demonstrates a versatile and portable technology. The rapid integration of tissue microarrays into biomarker discovery and validation processes reflects the forward thinking of researchers who have pioneered the high-density tissue microarray. The precise arrangement of hundreds of archival clinical tissue samples into a composite tissue microarray block is now a proven method for the efficient and standardized analysis of molecular markers. With applications in cancer research, tissue microarrays are a valuable tool in validating candidate markers discovered in highly sensitive genome-wide microarray experiments. With applications in clinical pathology, tissue microarrays are used widely in immunohistochemistry quality control and quality assurance. The timeline of a biomarker implicated in prostate neoplasia, which was identified by complementary DNA expression profiling, validated by tissue microarrays and is now used as a prognostic immunohistochemistry marker, is reviewed. The tissue microarray format provides opportunities for digital imaging acquisition, image processing and database integration. Advances in digital imaging help to alleviate previous bottlenecks in the research pipeline, permit computer image scoring and convey telepathology opportunities for remote image analysis. The tissue microarray industry now includes public and private sectors with varying degrees of research utility and offers a range of potential tissue microarray applications in basic research, prognostic oncology and drug discovery.
Collapse
Affiliation(s)
- Aprill Watanabe
- TMA Core Service, Translational Genomics Research Institute, 400 N. 5 Street, Phoenix AZ 85004, USA
| | | | | |
Collapse
|
11
|
Abstract
It is impossible to overstate the importance of XML (eXtensible Markup Language) as a data organization tool. With XML, pathologists can annotate all of their data (clinical and anatomic) in a format that can transform every pathology report into a database, without compromising narrative structure. The purpose of this manuscript is to provide an overview of XML for pathologists. Examples will demonstrate how pathologists can use XML to annotate individual data elements and to structure reports in a common format that can be merged with other XML files or queried using standard XML tools. This manuscript gives pathologists a glimpse into how XML allows pathology data to be linked to other types of biomedical data and reduces our dependence on centralized proprietary databases.
Collapse
Affiliation(s)
- Jules J Berman
- Cancer Diagnosis Program, National Cancer Institute, National Institutes of Health, Bethesda, USA.
| |
Collapse
|
12
|
Berman JJ. In Reply. Arch Pathol Lab Med 2004. [DOI: 10.5858/2004-128-954b-ir] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 11/06/2022]
Affiliation(s)
- Jules J. Berman
- Program Director for Pathology Informatics, Cancer Diagnosis Program, DCTD, NCI, NIH Rockville, MD 20892
| |
Collapse
|