Download PDFOpen PDF in browser

A Combined Semantic Search and Machine Learning Approach for Address Entity Resolution

EasyChair Preprint no. 832

7 pagesDate: March 16, 2019

Abstract

We have developed a comprehensive prototype solution for a specific use case involving entity resolution for mailing addresses of financial institutions. Our objective was to find matches between user entry of misspelled or inaccurate addresses of business entities and their corresponding entries in a “gold copy” of complete and accurate mailing addresses (dictionary). Three distinct matching methods (PySolr, SoDA and Record Linkage) were used for a preliminary, yet diverse scheme of lookups in finding matches. These lookup processes may optionally be followed by search via a hybrid machine learning (ML) model via regularized logistic regression and hierarchical clustering using Dedupe. Our experimental results of elapsed times for searches using the three lookup methods on a variety of match types suggest that majority of the simpler matches are detected extremely fast (elapsed times: ~ 6 – 48 milliseconds) at the lookup stage, making it suitable for detecting simple and possibly most common errors in user entries for mailing addresses. The performance of ML models, on the other hand, is comparatively slower (elapsed times: ~ 174 – 201 milliseconds). Nevertheless, the hybrid ML model seems most suitable in cases where multiple ambiguities exist in user entry of addresses, and, as a result, the preliminary lookup methods may fail to detect possible matches. The precision and recall of the ML model on a sizeable test dataset are 0.89 and 0.94, respectively. These high scores on model performance suggest that the ML models can be applied successfully to entity resolution of mailing addresses. Our combined solution can be integrated with any enterprise software applications in order to provide both efficient and robust address matching service in cases where users enter mailing addresses as free-form texts that may carry inaccuracies.

Keyphrases: address entity resolution, Entity Resolution, entity resolution problem, gap distance, machine learning, matching method, Natural Language Processing, semantic search

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@Booklet{EasyChair:832,
  author = {Anne Moshyedi and Taylor Kramer and Amitava Gangopadhyay and Sujit Pal},
  title = {A Combined Semantic Search and Machine Learning Approach for Address Entity Resolution},
  howpublished = {EasyChair Preprint no. 832},

  year = {EasyChair, 2019}}
Download PDFOpen PDF in browser