Download PDFOpen PDF in browser

An ensemble model for sentiment classification on code-mixed data in Dravidian Languages

EasyChair Preprint no. 7266

9 pagesDate: December 27, 2021


Dravidian languages, Tamil, Kannada, Malayalam and Telugu, is spoken by over 220 million but is vastly under-resourced for natural language processing tasks. Code-switching and code-mixing have been on the rise, with multilingual speakers opting for expressing their opinion in their mother tongue along with English in both written text as well as in speech. Challenges arise in sentiment analysis of code-switched Dravidian languages because of under-resourced corpora and randomness in language interspersing. This paper applied an ensemble sentiment classification strategy based on majority voting using 13 different classification models on the Dravidian code-mixed languages dataset provided in FIRE 2021. The code-mixed dataset contained YouTube comments where the average word count per comment is less than 7. The key conclusion from our experiments was that the ensemble of multiple classifiers outperformed others for sentiment classification. Our approaches show that a result of weighted F1-score of 0.59, 0.65 and 0.60, respectively, on Kannada, Malayalam and Tamil code-switched data can be achieved with the traditional machine learning algorithms through an ensemble of multiple classifiers.

Keyphrases: code-mixing, code-switching, Dravidian, Kanglish, Manglish, sentiment classification, Tanglish

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
  author = {S R Mithun Kumar and Nihal Reddy and Aruna Malapati and Lov Kumar},
  title = {An ensemble model for sentiment classification on code-mixed data in Dravidian Languages},
  howpublished = {EasyChair Preprint no. 7266},

  year = {EasyChair, 2021}}
Download PDFOpen PDF in browser