Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Winkler WE. Matching and record linkage. ACTA ACUST UNITED AC 2014. [DOI: 10.1002/wics.1317] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Number

Cited by Other Article(s)

Rempel JL, Belfer E, Ray I, Morello-Frosch R. Access for sale? Overlying rights, land transactions, and groundwater in California. ENVIRONMENTAL RESEARCH LETTERS : ERL [WEB SITE] 2024;19:024017. [PMID: 38283952 PMCID: PMC10811753 DOI: 10.1088/1748-9326/ad0f71] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 10/04/2023] [Accepted: 11/23/2023] [Indexed: 01/30/2024]

Xu H, Li X, Zhang Z, Grannis S. Score test for assessing the conditional dependence in latent class models and its application to record linkage. J R Stat Soc Ser C Appl Stat 2022. [DOI: 10.1111/rssc.12590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

A prior for record linkage based on allelic partitions. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107474] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

A Record Linkage-Based Data Deduplication Framework with DataCleaner Extension. MULTIMODAL TECHNOLOGIES AND INTERACTION 2022. [DOI: 10.3390/mti6040027] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Abstract The data management process is characterised by a set of tasks where data quality management (DQM) is one of the core components. Data quality, however, is a multidimensional concept, where the nature of the data quality issues is very diverse. One of the most widely anticipated data quality challenges, which becomes particularly vital when data come from multiple data sources which is a typical situation in the current data-driven world, is duplicates or non-uniqueness. Even more, duplicates were recognised to be one of the key domain-specific data quality dimensions in the context of the Internet of Things (IoT) application domains, where smart grids and health dominate most. Duplicate data lead to inaccurate analyses, leading to wrong decisions, negatively affect data-driven and/or data processing activities such as the development of models, forecasts, simulations, have a negative impact on customer service, risk and crisis management, service personalisation in terms of both their accuracy and trustworthiness, decrease user adoption and satisfaction, etc. The process of determination and elimination of duplicates is known as deduplication, while the process of finding duplicates in one or more databases that refer to the same entities is known as Record Linkage. To find the duplicates, the data sets are compared with each other using similarity functions that are usually used to compare two input strings to find similarities between them, which requires quadratic time complexity. To defuse the quadratic complexity of the problem, especially in large data sources, record linkage methods, such as blocking and sorted neighbourhood, are used. In this paper, we propose a six-step record linkage deduplication framework. The operation of the framework is demonstrated on a simplified example of research data artifacts, such as publications, research projects and others of the real-world research institution representing Research Information Systems (RIS) domain. To make the proposed framework usable we integrated it into a tool that is already used in practice, by developing a prototype of an extension for the well-known DataCleaner. The framework detects and visualises duplicates thereby identifying and providing the user with identified redundancies in a user-friendly manner allowing their further elimination. By removing the redundancies, the quality of the data is improved therefore improving analyses and decision-making. This study makes a call for other researchers to take a step towards the “golden record” that can be achieved when all data quality issues are recognised and resolved, thus moving towards absolute data quality. Collapse

Binette O, Steorts RC. (Almost) all of entity resolution. SCIENCE ADVANCES 2022;8:eabi8021. [PMID: 35333582 DOI: 10.1126/sciadv.abi8021] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Knowledge Graphs: A Practical Review of the Research Landscape. INFORMATION 2022. [DOI: 10.3390/info13040161] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Desmet C, Cook DJ. Recent Developments in Privacy-Preserving Mining of Clinical Data. ACM/IMS TRANSACTIONS ON DATA SCIENCE 2021;2:28. [PMID: 35018368 PMCID: PMC8746818 DOI: 10.1145/3447774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 01/01/2021] [Indexed: 06/14/2023]

Xu H, Li X, Grannis S. A simple two-step procedure using the Fellegi-Sunter model for frequency-based record linkage. J Appl Stat 2021;49:2789-2804. [PMID: 35909667 DOI: 10.1080/02664763.2021.1922615] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Marchant NG, Kaplan A, Elazar DN, Rubinstein BIP, Steorts RC. d-blink: Distributed End-to-End Bayesian Entity Resolution. J Comput Graph Stat 2021. [DOI: 10.1080/10618600.2020.1825451] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Salvati N, Fabrizi E, Ranalli MG, Chambers RL. Small area estimation with linked data. J R Stat Soc Series B Stat Methodol 2020. [DOI: 10.1111/rssb.12401] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Stammler S, Kussel T, Schoppmann P, Stampe F, Tremper G, Katzenbeisser S, Hamacher K, Lablans M. Mainzelliste SecureEpiLinker (MainSEL): Privacy-Preserving Record Linkage using Secure Multi-Party Computation. Bioinformatics 2020;38:1657-1668. [PMID: 32871006 PMCID: PMC8896632 DOI: 10.1093/bioinformatics/btaa764] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 07/24/2020] [Accepted: 08/25/2020] [Indexed: 11/17/2022] Open

Ong TC, Duca LM, Kahn MG, Crume TL. A hybrid approach to record linkage using a combination of deterministic and probabilistic methodology. J Am Med Inform Assoc 2020;27:505-513. [PMID: 32049329 DOI: 10.1093/jamia/ocz232] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Revised: 12/02/2019] [Accepted: 01/06/2020] [Indexed: 11/14/2022] Open

Fernández-Álvarez D, Gayo JEL, Gayo-Avello D, Ordóñez de Pablos P. MERA. INT J SEMANT WEB INF 2017. [DOI: 10.4018/ijswis.2017100103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Linkage disequilibrium matches forensic genetic records to disjoint genomic marker sets. Proc Natl Acad Sci U S A 2017;114:5671-5676. [PMID: 28507140 PMCID: PMC5465933 DOI: 10.1073/pnas.1619944114] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open

Lohr SL, Raghunathan TE. Combining Survey Data with Other Data Sources. Stat Sci 2017. [DOI: 10.1214/16-sts584] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Croset S, Rupp J, Romacker M. Flexible data integration and curation using a graph-based approach. Bioinformatics 2016;32:918-25. [PMID: 26556384 DOI: 10.1093/bioinformatics/btv644] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Accepted: 10/21/2015] [Indexed: 11/14/2022] Open

Unsupervised Entity Resolution on Multi-type Graphs. LECTURE NOTES IN COMPUTER SCIENCE 2016. [DOI: 10.1007/978-3-319-46523-4_39] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]