Guided Inductive Logic Programming: Cleaning Knowledge Bases with Iterative User Feedback

15 pages•Published: April 27, 2020

Yan Wu, Jinchuan Chen, Plarent Haxhidauti, Vinu Ellampallil Venugopal and Martin Theobald

Abstract

Domain-oriented knowledge bases (KBs) such as DBpedia and YAGO are largely constructed by applying a set of predefined extraction rules to the semi-structured contents of Wikipedia articles. Although both of these large-scale KBs achieve very high average precision values (above 95% for YAGO3), subtle mistakes in a few of the underlying ex- traction rules may still impose a substantial amount of systematic extraction mistakes for specific relations. For example, by applying the same regular expressions to extract per- son names of both Asian and Western nationality, YAGO erroneously swaps most of the family and given names of Asian person entities. For traditional rule-learning approaches based on Inductive Logic Programming (ILP), it is very difficult to detect these systematic extraction mistakes, since they usually occur only in a relatively small subdomain of the relations’ arguments. In this paper, we thus propose a guided form of ILP, coined “GILP”, that iteratively asks for small amounts of user feedback over a given KB to learn a set of data-cleaning rules that (1) best match the feedback and (2) also generalize to a larger portion of facts in the KB. We propose both algorithms and respective metrics to automatically assess the quality of the learned rules with respect to the user feedback.

Keyphrases: data cleaning, feedback, knowledge bases, rule learning

In: Gregoire Danoy, Jun Pang and Geoff Sutcliffe (editors). GCAI 2020. 6th Global Conference on Artificial Intelligence (GCAI 2020), vol 72, pages 92-106.

Links:	https://easychair.org/publications/paper/N3D1
	https://doi.org/10.29007/ppgx

BibTeX entry

@inproceedings{GCAI2020:Guided_Inductive_Logic_Programming,
  author    = {Yan Wu and Jinchuan Chen and Plarent Haxhidauti and Vinu Ellampallil Venugopal and Martin Theobald},
  title     = {Guided Inductive Logic Programming: Cleaning Knowledge Bases with Iterative User Feedback},
  booktitle = {GCAI 2020. 6th Global Conference on Artificial Intelligence (GCAI 2020)},
  editor    = {Gregoire Danoy and Jun Pang and Geoff Sutcliffe},
  series    = {EPiC Series in Computing},
  volume    = {72},
  publisher = {EasyChair},
  bibsource = {EasyChair, https://easychair.org},
  issn      = {2398-7340},
  url       = {/publications/paper/N3D1},
  doi       = {10.29007/ppgx},
  pages     = {92-106},
  year      = {2020}}

Download PDF Open PDF in browser