Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Shan M, Thomas KS, Gutman R. A Bayesian MultiLayer Record Linkage Procedure to Analyze Post-Acute Care Recovery of Patients with Traumatic Brain Injury. Biostatistics 2023;24:743-759. [PMID: 35579386 PMCID: PMC10345988 DOI: 10.1093/biostatistics/kxac016] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 04/11/2022] [Accepted: 04/18/2022] [Indexed: 07/20/2023] Open

Cardinal RN, Moore A, Burchell M, Lewis JR. De-identified Bayesian personal identity matching for privacy-preserving record linkage despite errors: development and validation. BMC Med Inform Decis Mak 2023;23:85. [PMID: 37147600 PMCID: PMC10163749 DOI: 10.1186/s12911-023-02176-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 04/21/2023] [Indexed: 05/07/2023] Open

Abstract

BACKGROUND

Epidemiological research may require linkage of information from multiple organizations. This can bring two problems: (1) the information governance desirability of linkage without sharing direct identifiers, and (2) a requirement to link databases without a common person-unique identifier.

METHODS

We develop a Bayesian matching technique to solve both. We provide an open-source software implementation capable of de-identified probabilistic matching despite discrepancies, via fuzzy representations and complete mismatches, plus de-identified deterministic matching if required. We validate the technique by testing linkage between multiple medical records systems in a UK National Health Service Trust, examining the effects of decision thresholds on linkage accuracy. We report demographic factors associated with correct linkage.

RESULTS

The system supports dates of birth (DOBs), forenames, surnames, three-state gender, and UK postcodes. Fuzzy representations are supported for all except gender, and there is support for additional transformations, such as accent misrepresentation, variation for multi-part surnames, and name re-ordering. Calculated log odds predicted a proband's presence in the sample database with an area under the receiver operating curve of 0.997-0.999 for non-self database comparisons. Log odds were converted to a decision via a consideration threshold θ and a leader advantage threshold δ. Defaults were chosen to penalize misidentification 20-fold versus linkage failure. By default, complete DOB mismatches were disallowed for computational efficiency. At these settings, for non-self database comparisons, the mean probability of a proband being correctly declared to be in the sample was 0.965 (range 0.931-0.994), and the misidentification rate was 0.00249 (range 0.00123-0.00429). Correct linkage was positively associated with male gender, Black or mixed ethnicity, and the presence of diagnostic codes for severe mental illnesses or other mental disorders, and negatively associated with birth year, unknown ethnicity, residential area deprivation, and presence of a pseudopostcode (e.g. indicating homelessness). Accuracy rates would be improved further if person-unique identifiers were also used, as supported by the software. Our two largest databases were linked in 44 min via an interpreted programming language.

CONCLUSIONS

Fully de-identified matching with high accuracy is feasible without a person-unique identifier and appropriate software is freely available.

Collapse

Sato J, Mitsutake N, Yamada H, Kitsuregawa M, Goda K. Virtual patient identifier (vPID): Improving patient traceability using anonymized identifiers in Japanese healthcare insurance claims database. Heliyon 2023;9:e16209. [PMID: 37234615 PMCID: PMC10205637 DOI: 10.1016/j.heliyon.2023.e16209] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 05/09/2023] [Accepted: 05/10/2023] [Indexed: 05/28/2023] Open

Abstract

Objective

Japan's national-level healthcare insurance claims database (NDB) is a collective database that contains the entire information on healthcare services being provided to all citizens. However, existing anonymized identifiers (ID1 and ID2) have a poor capability of tracing patients' claims in the database, hindering longitudinal analyses. This study presents a virtual patient identifier (vPID), which we have developed on top of these existing identifiers, to improve the patient traceability.

Methods

vPID is a new composite identifier that intensively consolidates ID1 and ID2 co-occurring in an identical claim to allow to collect claims of each patient even though its ID1 or ID2 may change due to life events or clerical errors. We conducted a verification test with prefecture-level datasets of healthcare insurance claims and enrollee history records, which allowed us to compare vPID with the ground truth, in terms of an identifiability score (indicating a capability of distinguishing a patient's claims from another patient's claims) and a traceability score (indicating a capability of collecting claims of an identical patient).

Results

The verification test has clarified that vPID offers significantly higher traceability scores (0.994, Mie; 0.997, Gifu) than ID1 (0.863, Mie; 0.884, Gifu) and ID2 (0.602, Mie; 0.839, Gifu), and comparable (0.996, Mie) and lower (0.979, Gifu) identifiability scores.

Discussion

vPID is seemingly useful for a wide spectrum of analytic studies unless they focus on sensitive cases to the design limitation of vPID, such as patients experiencing marriage and job change, simultaneously, and same-sex twin children.

Conclusion

vPID successfully improves patient traceability, providing an opportunity for longitudinal analyses that used to be practically impossible for NDB. Further exploration is also necessary, in particular, for mitigating identification errors.

Collapse

Smith D, Elliot M, Sakshaug JW. To Link or Synthesize? An Approach to Data Quality Comparison. ACM JOURNAL OF DATA AND INFORMATION QUALITY 2023. [DOI: 10.1145/3580487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]

Extending the Fellegi-Sunter record linkage model for mixed-type data with application to the French national health data system. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

A prior for record linkage based on allelic partitions. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107474] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Tuoto T, Di Cecco D, Tancredi A. Bayesian analysis of one-inflated models for elusive population size estimation. Biom J 2022;64:912-933. [PMID: 35534439 PMCID: PMC9314905 DOI: 10.1002/bimj.202100187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 01/11/2022] [Accepted: 02/05/2022] [Indexed: 12/04/2022]

Improving Wildlife Population Inference Using Aerial Imagery and Entity Resolution. JOURNAL OF AGRICULTURAL, BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2022. [DOI: 10.1007/s13253-021-00484-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

On the consistent estimation of linkage errors without training data. JAPANESE JOURNAL OF STATISTICS AND DATA SCIENCE 2022. [DOI: 10.1007/s42081-022-00153-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Optimizing the Retrieval of the Vital Status of Cancer Patients for Health Data Warehouses by Using Open Government Data in France. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022;19:ijerph19074272. [PMID: 35409956 PMCID: PMC8998644 DOI: 10.3390/ijerph19074272] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 03/22/2022] [Accepted: 03/30/2022] [Indexed: 02/06/2023]

Kaplan A, Betancourt B, Steorts RC. A Practical Approach to Proper Inference with Linked Data. AM STAT 2022. [DOI: 10.1080/00031305.2022.2041482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]

Aleshin-Guendel S, Sadinle M. Multifile Partitioning for Record Linkage and Duplicate Detection. J Am Stat Assoc 2022;118:1786-1795. [PMID: 37771512 PMCID: PMC10530869 DOI: 10.1080/01621459.2021.2013242] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2020] [Accepted: 11/28/2021] [Indexed: 10/19/2022]

Shan M, Thomas KS, Gutman R. A MULTIPLE IMPUTATION PROCEDURE FOR RECORD LINKAGE AND CAUSAL INFERENCE TO ESTIMATE THE EFFECTS OF HOME-DELIVERED MEALS. Ann Appl Stat 2021;15:412-436. [PMID: 35755005 PMCID: PMC9222523 DOI: 10.1214/20-aoas1397] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2024]

Marchant NG, Kaplan A, Elazar DN, Rubinstein BIP, Steorts RC. d-blink: Distributed End-to-End Bayesian Entity Resolution. J Comput Graph Stat 2021. [DOI: 10.1080/10618600.2020.1825451] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Betancourt B, Zanella G, Steorts RC. Random Partition Models for Microclustering Tasks. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2020.1841647] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Smith D. Re‐identification in the Absence of Common Variables for Matching. Int Stat Rev 2019. [DOI: 10.1111/insr.12353] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Xu H, Li X, Shen C, Hui SL, Grannis S. Incorporating conditional dependence in latent class models for probabilistic record linkage: Does it matter? Ann Appl Stat 2019. [DOI: 10.1214/19-aoas1256] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Zanella G. Informed Proposals for Local MCMC in Discrete Spaces. J Am Stat Assoc 2019. [DOI: 10.1080/01621459.2019.1585255] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Han Y, Lahiri P. Statistical Analysis with Linked Data. Int Stat Rev 2018. [DOI: 10.1111/insr.12295] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]

Dalzell NM, Reiter JP. Regression Modeling and File Matching Using Possibly Erroneous Matching Variables. J Comput Graph Stat 2018. [DOI: 10.1080/10618600.2018.1458624] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]

Hurley PD, Oliver S, Mehta A. Creating longitudinal datasets and cleaning existing data identifiers in a cystic fibrosis registry using a novel Bayesian probabilistic approach from astronomy. PLoS One 2018;13:e0199815. [PMID: 29985939 PMCID: PMC6037350 DOI: 10.1371/journal.pone.0199815] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 06/14/2018] [Indexed: 11/18/2022] Open

Chen B, Shrivastava A, Steorts RC. Unique entity estimation with application to the Syrian conflict. Ann Appl Stat 2018. [DOI: 10.1214/18-aoas1163] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Sadinle M. Bayesian propagation of record linkage uncertainty into population size estimation of human rights violations. Ann Appl Stat 2018. [DOI: 10.1214/18-aoas1178] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Briscolini D, Di Consiglio L, Liseo B, Tancredi A, Tuoto T. New methods for small area estimation with linkage uncertainty. Int J Approx Reason 2018. [DOI: 10.1016/j.ijar.2017.12.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Hof MH, Ravelli AC, Zwinderman AH. A Probabilistic Record Linkage Model for Survival Data. J Am Stat Assoc 2018. [DOI: 10.1080/01621459.2017.1311262] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Goldstein H, Harron K, Cortina-Borja M. A scaling approach to record linkage. Stat Med 2017;36:2514-2521. [PMID: 28303597 PMCID: PMC6205620 DOI: 10.1002/sim.7287] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2016] [Accepted: 03/02/2017] [Indexed: 11/10/2022]

Sadinle M. Bayesian Estimation of Bipartite Matchings for Record Linkage. J Am Stat Assoc 2017. [DOI: 10.1080/01621459.2016.1148612] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Steorts RC, Hall R, Fienberg SE. A Bayesian Approach to Graphical Record Linkage and Deduplication. J Am Stat Assoc 2017. [DOI: 10.1080/01621459.2015.1105807] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

McClintock BT, Bailey LL, Dreher BP, Link WA. Probit models for capture–recapture data subject to imperfect detection, individual heterogeneity and misidentification. Ann Appl Stat 2014. [DOI: 10.1214/14-aoas783] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Sadinle M. Detecting duplicates in a homicide registry using a Bayesian partitioning approach. Ann Appl Stat 2014. [DOI: 10.1214/14-aoas779] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Winkler WE. Matching and record linkage. ACTA ACUST UNITED AC 2014. [DOI: 10.1002/wics.1317] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Kum HC, Krishnamurthy A, Machanavajjhala A, Reiter MK, Ahalt S. Privacy preserving interactive record linkage (PPIRL). J Am Med Inform Assoc 2013;21:212-20. [PMID: 24201028 DOI: 10.1136/amiajnl-2013-002165] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open

Lum K, Price ME, Banks D. Applications of Multiple Systems Estimation in Human Rights Research. AM STAT 2013. [DOI: 10.1080/00031305.2013.821093] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Xu H, Hui SL, Grannis S. Optimal two-phase sampling design for comparing accuracies of two binary classification rules. Stat Med 2013;33:500-13. [DOI: 10.1002/sim.5946] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2012] [Accepted: 07/22/2013] [Indexed: 11/11/2022]

Sadinle M, Fienberg SE. A Generalized Fellegi–Sunter Framework for Multiple Record Linkage With Application to Homicide Record Systems. J Am Stat Assoc 2013. [DOI: 10.1080/01621459.2012.757231] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Gutman R, Afendulis CC, Zaslavsky AM. A Bayesian Procedure for File Linking to Analyze End-of-Life Medical Costs. J Am Stat Assoc 2013;108:34-47. [PMID: 23645944 DOI: 10.1080/01621459.2012.726889] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Valid Statistical Inference on Automatically Matched Files. ACTA ACUST UNITED AC 2012. [DOI: 10.1007/978-3-642-33627-0_11] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]