Learning Long-text Semantic Similarity with Multi-Granularity Semantic Embedding Based on Knowledge Enhancement

EasyChair Preprint 3167

13 pages•Date: April 13, 2020

Deguang Peng, Bohui Hao, Xianlun Tang, Yingjie Chen and Jian Sun

Abstract

We propose a new method of semantic similarity calculation-"multi-granular semantic embedding model based on knowledge enhancement (MSE based knowledge)" to solve the similarity and relevance of long text semantic matching. The method firstly enhances semantics through the external knowledge base DBpedia, and simultaneously considers semantic attributes and relationships on the vector representation of key entities. Secondly, each long text is expressed as a multi-granularity vector: character vectors constructed based on one-dimensional convolution, word vectors constructed based on external knowledge sources and pre-trained word vectors, and sentence vectors constructed based on bidirectional LSTM. Furthermore, we use the Siamese network framework to calculate the final similarity. To get better results, we add the attention mechanism after the character vector representation to further weight the key characters. In the end, we evaluate the method on two popular data sets (LP50 and MSRP). Experimental results show that the method in this paper makes better use of long text knowledge and achieves higher accuracy with less time cost.

Keyphrases: Artificial Intelligence, Natural Language Processing, Semantic similarity calculation, deep learning

Links:

https://easychair.org/publications/preprint/X3lf

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:3167,
  author    = {Deguang Peng and Bohui Hao and Xianlun Tang and Yingjie Chen and Jian Sun},
  title     = {Learning Long-text Semantic Similarity with Multi-Granularity Semantic Embedding Based on Knowledge Enhancement},
  howpublished = {EasyChair Preprint 3167},
  year      = {EasyChair, 2020}}

Download PDF Open PDF in browser