Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

A Record Linkage-Based Data Deduplication Framework with DataCleaner Extension. MULTIMODAL TECHNOLOGIES AND INTERACTION 2022. [DOI: 10.3390/mti6040027] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Abstract The data management process is characterised by a set of tasks where data quality management (DQM) is one of the core components. Data quality, however, is a multidimensional concept, where the nature of the data quality issues is very diverse. One of the most widely anticipated data quality challenges, which becomes particularly vital when data come from multiple data sources which is a typical situation in the current data-driven world, is duplicates or non-uniqueness. Even more, duplicates were recognised to be one of the key domain-specific data quality dimensions in the context of the Internet of Things (IoT) application domains, where smart grids and health dominate most. Duplicate data lead to inaccurate analyses, leading to wrong decisions, negatively affect data-driven and/or data processing activities such as the development of models, forecasts, simulations, have a negative impact on customer service, risk and crisis management, service personalisation in terms of both their accuracy and trustworthiness, decrease user adoption and satisfaction, etc. The process of determination and elimination of duplicates is known as deduplication, while the process of finding duplicates in one or more databases that refer to the same entities is known as Record Linkage. To find the duplicates, the data sets are compared with each other using similarity functions that are usually used to compare two input strings to find similarities between them, which requires quadratic time complexity. To defuse the quadratic complexity of the problem, especially in large data sources, record linkage methods, such as blocking and sorted neighbourhood, are used. In this paper, we propose a six-step record linkage deduplication framework. The operation of the framework is demonstrated on a simplified example of research data artifacts, such as publications, research projects and others of the real-world research institution representing Research Information Systems (RIS) domain. To make the proposed framework usable we integrated it into a tool that is already used in practice, by developing a prototype of an extension for the well-known DataCleaner. The framework detects and visualises duplicates thereby identifying and providing the user with identified redundancies in a user-friendly manner allowing their further elimination. By removing the redundancies, the quality of the data is improved therefore improving analyses and decision-making. This study makes a call for other researchers to take a step towards the “golden record” that can be achieved when all data quality issues are recognised and resolved, thus moving towards absolute data quality. Collapse

Binette O, Steorts RC. (Almost) all of entity resolution. SCIENCE ADVANCES 2022;8:eabi8021. [PMID: 35333582 DOI: 10.1126/sciadv.abi8021] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Ilyas IF, Rekatsinas T. Machine Learning and Data Cleaning: Which Serves the Other? ACM JOURNAL OF DATA AND INFORMATION QUALITY 2022. [DOI: 10.1145/3506712] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Ali A, Emran NA, Asmai SA. Missing values compensation in duplicates detection using hot deck method. JOURNAL OF BIG DATA 2021;8:112. [DOI: 10.1186/s40537-021-00502-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 08/08/2021] [Indexed: 09/01/2023]

Niknam M, Minaei-Bidgoli B, Dianat R. The role of transitive closure in evaluating blocking methods for dirty entity resolution. J Intell Inf Syst 2021. [DOI: 10.1007/s10844-021-00676-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Naranjo-Zeledón L, Chacón-Rivas M, Peral J, Ferrández A. Architecture design of a reinforcement environment for learning sign languages. PeerJ Comput Sci 2021;7:e740. [PMID: 34722873 PMCID: PMC8530094 DOI: 10.7717/peerj-cs.740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 09/17/2021] [Indexed: 06/13/2023]

Abstract

Different fields such as linguistics, teaching, and computing have demonstrated special interest in the study of sign languages (SL). However, the processes of teaching and learning these languages turn complex since it is unusual to find people teaching these languages that are fluent in both SL and the native language of the students. The teachings from deaf individuals become unique. Nonetheless, it is important for the student to lean on supportive mechanisms while being in the process of learning an SL. Bidirectional communication between deaf and hearing people through SL is a hot topic to achieve a higher level of inclusion. However, all the processes that convey teaching and learning SL turn difficult and complex since it is unusual to find SL teachers that are fluent also in the native language of the students, making it harder to provide computer teaching tools for different SL. Moreover, the main aspects that a second language learner of an SL finds difficult are phonology, non-manual components, and the use of space (the latter two are specific to SL, not to spoken languages). This proposal appears to be the first of the kind to favor the Costa Rican Sign Language (LESCO, for its Spanish acronym), as well as any other SL. Our research focus stands on reinforcing the learning process of final-user hearing people through a modular architectural design of a learning environment, relying on the concept of phonological proximity within a graphical tool with a high degree of usability. The aim of incorporating phonological proximity is to assist individuals in learning signs with similar handshapes. This architecture separates the logic and processing aspects from those associated with the access and generation of data, which makes it portable to other SL in the future. The methodology used consisted of defining 26 phonological parameters (13 for each hand), thus characterizing each sign appropriately. Then, a similarity formula was applied to compare each pair of signs. With these pre-calculations, the tool displays each sign and its top ten most similar signs. A SUS usability test and an open qualitative question were applied, as well as a numerical evaluation to a group of learners, to validate the proposal. In order to reach our research aims, we have analyzed previous work on proposals for teaching tools meant for the student to practice SL, as well as previous work on the importance of phonological proximity in this teaching process. This previous work justifies the necessity of our proposal, whose benefits have been proved through the experimentation conducted by different users on the usability and usefulness of the tool. To meet these needs, homonymous words (signs with the same starting handshape) and paronyms (signs with highly similar handshape), have been included to explore their impact on learning. It allows the possibility to apply the same perspective of our existing line of research to other SL in the future.

Collapse

Fellah A. All-Three: Near-optimal and domain-independent algorithms for near-duplicate detection. ARRAY 2021. [DOI: 10.1016/j.array.2021.100070] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Al-Masaeed M, Alghawanmeh M, Al-Singlawi A, Alsababha R, Alqudah M. An Examination of COVID-19 Medications' Effectiveness in Managing and Treating COVID-19 Patients: A Comparative Review. Healthcare (Basel) 2021;9:557. [PMID: 34068474 PMCID: PMC8151388 DOI: 10.3390/healthcare9050557] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 04/25/2021] [Accepted: 05/03/2021] [Indexed: 12/23/2022] Open

Li Y, Li J, Suhara Y, Wang J, Hirota W, Tan WC. Deep Entity Matching. ACM JOURNAL OF DATA AND INFORMATION QUALITY 2021. [DOI: 10.1145/3431816] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Loster M, Koumarelas I, Naumann F. Knowledge Transfer for Entity Resolution with Siamese Neural Networks. ACM JOURNAL OF DATA AND INFORMATION QUALITY 2021. [DOI: 10.1145/3410157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Araújo D, Santos Pires CE, Cassimiro Nascimento D. Leveraging active learning to reduce human effort in the generation of ground‐truth for entity resolution. Comput Intell 2020. [DOI: 10.1111/coin.12268] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Deep cross-platform product matching in e-commerce. INFORM RETRIEVAL J 2019. [DOI: 10.1007/s10791-019-09360-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Bisandu DB, Prasad R, Liman MM. Data clustering using efficient similarity measures. JOURNAL OF STATISTICS & MANAGEMENT SYSTEMS 2019. [DOI: 10.1080/09720510.2019.1565443] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

AHAB: Aligning heterogeneous knowledge bases via iterative blocking. Inf Process Manag 2019. [DOI: 10.1016/j.ipm.2018.08.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Reasoning about attribute value equivalence in relational data. INFORM SYST 2018. [DOI: 10.1016/j.is.2018.02.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

van Gennip Y, Hunter B, Ma A, Moyer D, de Vera R, Bertozzi AL. Unsupervised record matching with noisy and incomplete data. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2018. [DOI: 10.1007/s41060-018-0129-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Jurek A, Hong J, Chi Y, Liu W. A novel ensemble learning approach to unsupervised record linkage. INFORM SYST 2017. [DOI: 10.1016/j.is.2017.06.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Sagi T, Gal A, Barkol O, Bergman R, Avram A. Multi-source uncertain entity resolution: Transforming holocaust victim reports into people. INFORM SYST 2017. [DOI: 10.1016/j.is.2016.12.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Sohail A, Yousaf MM. A proficient cost reduction framework for de-duplication of records in data integration. BMC Med Inform Decis Mak 2016;16:42. [PMID: 27067004 PMCID: PMC4828843 DOI: 10.1186/s12911-016-0280-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2015] [Accepted: 04/01/2016] [Indexed: 11/30/2022] Open

Winkler WE. Matching and record linkage. ACTA ACUST UNITED AC 2014. [DOI: 10.1002/wics.1317] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Randall SM, Boyd JH, Ferrante AM, Bauer JK, Semmens JB. Use of graph theory measures to identify errors in record linkage. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014;115:55-63. [PMID: 24768079 DOI: 10.1016/j.cmpb.2014.03.008] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Revised: 02/18/2014] [Accepted: 03/21/2014] [Indexed: 06/03/2023]

MFIBlocks: An effective blocking algorithm for entity resolution. INFORM SYST 2013. [DOI: 10.1016/j.is.2012.11.008] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Vatsalan D, Christen P, Verykios VS. A taxonomy of privacy-preserving record linkage techniques. INFORM SYST 2013. [DOI: 10.1016/j.is.2012.11.005] [Citation(s) in RCA: 168] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Cost-aware query planning for similarity search. INFORM SYST 2013. [DOI: 10.1016/j.is.2012.11.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Panse F, van Keulen M, Ritter N. Indeterministic Handling of Uncertain Decisions in Deduplication. ACM JOURNAL OF DATA AND INFORMATION QUALITY 2013. [DOI: 10.1145/2435221.2435225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

A Supervised Machine Learning Approach for Duplicate Detection over Gazetteer Records. GEOSPATIAL SEMANTICS 2011. [DOI: 10.1007/978-3-642-20630-6_3] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]