Rastegar-Mojarad M, Kadolph C, Ye Z, Wall D, Murali N, Lin S. A fuzzy-match search engine for physician directories.
JMIR Med Inform 2014;
2:e30. [PMID:
25601050 PMCID:
PMC4288075 DOI:
10.2196/medinform.3463]
[Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Revised: 09/06/2014] [Accepted: 09/16/2014] [Indexed: 12/02/2022] Open
Abstract
Background
A search engine to find physicians’ information is a basic but crucial function of a health care provider’s website. Inefficient search engines, which return no results or incorrect results, can lead to patient frustration and potential customer loss. A search engine that can handle misspellings and spelling variations of names is needed, as the United States (US) has culturally, racially, and ethnically diverse names.
Objective
The Marshfield Clinic website provides a search engine for users to search for physicians’ names. The current search engine provides an auto-completion function, but it requires an exact match. We observed that 26% of all searches yielded no results. The goal was to design a fuzzy-match algorithm to aid users in finding physicians easier and faster.
Methods
Instead of an exact match search, we used a fuzzy algorithm to find similar matches for searched terms. In the algorithm, we solved three types of search engine failures: “Typographic”, “Phonetic spelling variation”, and “Nickname”. To solve these mismatches, we used a customized Levenshtein distance calculation that incorporated Soundex coding and a lookup table of nicknames derived from US census data.
Results
Using the “Challenge Data Set of Marshfield Physician Names,” we evaluated the accuracy of fuzzy-match engine–top ten (90%) and compared it with exact match (0%), Soundex (24%), Levenshtein distance (59%), and fuzzy-match engine–top one (71%).
Conclusions
We designed, created a reference implementation, and evaluated a fuzzy-match search engine for physician directories. The open-source code is available at the codeplex website and a reference implementation is available for demonstration at the datamarsh website.
Collapse