Download PDFOpen PDF in browser

A Character-Level Restoration of Sukhothai Inscriptions Using the Mask Language Model

EasyChair Preprint no. 11029

6 pagesDate: October 6, 2023

Abstract

The stone inscription is one type of written literature that recorded the history story and the manifestation of cultural identity in that era through a character engraving method on the stone with sharp metal material for each character until a sentence formed. To convey the message for the readers to understand the meaning. Therefore, the completeness of that sentence is of great importance natural language processing tasks. In particular, when transcription stone inscriptions, it is found that inscriptions’ parts cannot interpret. As a result of the period that elapsed, those inscriptions may have suffered deterioration from various causes, resulting in scratches over the text or faded markings, destroyed from natural disasters that making it impossible to analyze which specific characters were damaged. To address enhance the completeness of the missing sentence, this research employs a method of generating predictive models for the missing characters from the text. It utilizes the technique of incorporating a masked language model to assist in processing the experimental data, utilizing 3 types of multilingual pre-trained models as following models are used: (1) XLM-RoBERTa, (2) Bert-base-multilingual-cased, and (3) DistilBERT-base-multilingual-cased. In each training round, random characters are masked using the token “” or “[MASK]” to prompt the model to predict the missing words at the masked positions. From the experimental results, it was found that the accuracy of prediction from the three types of pretrained models is as follows: (1) 42, (2) 53, and (3) 50 percent respectively.

Keyphrases: Bidirectional Encoder Representations from Transformers (BERT), Mask Language Model, Natural Language Processing, transformer

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@Booklet{EasyChair:11029,
  author = {Sujitra Tongkhum and Sukree Sinthupinyo},
  title = {A Character-Level Restoration of Sukhothai Inscriptions Using the Mask Language Model},
  howpublished = {EasyChair Preprint no. 11029},

  year = {EasyChair, 2023}}
Download PDFOpen PDF in browser