Sentiment Analysis of Emirati Dialect
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
MDPI
Abstract
: Recently, extensive studies and research in the Arabic Natural Language Processing (ANLP)
field have been conducted for text classification and sentiment analysis. Moreover, the number of
studies that target Arabic dialects has also increased. In this research paper, we constructed the
first manually annotated dataset of the Emirati dialect for the Instagram platform. The constructed
dataset consisted of more than 70,000 comments, mostly written in the Emirati dialect. We annotated
the comments in the dataset based on text polarity, dividing them into positive, negative, and
neutral categories, and the number of annotated comments was 70,000. Moreover, the dataset was
also annotated for the dialect type, categorized into the Emirati dialect, Arabic dialects, and MSA.
Preprocessing and TF-IDF features extraction approaches were applied to the constructed Emirati
dataset to prepare the dataset for the sentiment analysis experiment and improve its classification
performance. The sentiment analysis experiment was carried out on both balanced and unbalanced
datasets using several machine learning classifiers. The evaluation metrics of the sentiment analysis
experiments were accuracy, recall, precision, and f-measure. The results reported that the best
accuracy result was 80.80%, and it was achieved when the ensemble model was applied for the
sentiment classification of the unbalanced dataset.
Description
Keywords
corpus; Emirati dataset; Arabic dialects; sentiment analysis; classification; classifiers
Citation
Arwa A. Al Shamsi and Sherief Abdallah (2022) “Sentiment Analysis of Emirati Dialect,” Big Data and Cognitive Computing, 6(2), p. 57.