This item is non-discoverable
The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach
Date
2021-03-26
Journal Title
Journal ISSN
Volume Title
Publisher
Elsevier
Abstract
Sentiment Analysis is achieved by using Natural Language Processing (NLP) techniques and finds wide applications in analyzing
social media content to determine people's opinions, attitudes, and emotions toward entities, individuals, issues, events, or topics.
The accuracy of sentiment analysis depends on automatic Part-of-Speech (PoS) tagging which is required to label words according
to grammatical categories. The challenge of analyzing the Arabic language has found considerable research interest, but now the
challenge is amplified with the addition of social media dialects. While numerous morphological analyzers and PoS taggers were
proposed for Modern Standard Arabic (MSA), we are now witnessing an increased interest in applying those techniques to the
Arabic dialect that is prominent in social media. Indeed, social media texts (e.g. posts, comments, and replies) differ significantly
from MSA texts in terms of vocabulary and grammatical structure. Such differences call for reviewing the PoS tagging methods
to adapt social media texts. Furthermore, the lack of sufficiently large and diverse social media text corpora constitutes one of the
reasons that automatic PoS tagging of social media content has been rarely studied. In this paper, we address those limitations by
proposing a novel Arabic social media text corpus that is enriched with complete PoS information, including tags, lemmas, and
synonyms. The proposed corpus constitutes the largest manually annotated Arabic corpus to date, with more than 5 million tokens,
238,600 MSA texts, and words from Arabic social media dialect, collected from 65,000 online users' accounts. Furthermore, our
proposed corpus was used to train a custom Long Short-Term Memory deep learning model and showed excellent performance in
terms of sentiment classification accuracy and F1-score. The obtained results demonstrate that the use of a diverse corpus that is
enriched with PoS information significantly enhances the performance of social media analysis techniques and opens the door for
advanced features such as opinion mining and emotion intelligence.
Description
Keywords
Sentiment Analysis; Part of Speech Tagging; Arabic Language; Dialect Arabic; Neural Network.
Citation
Nerabie, A.M. et al. (2021) “The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach,” Procedia Computer Science, 184, pp. 148–155.