Sentiment analysis for Arabic
Loading...
Date
2015-12
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
The British University in Dubai (BUiD)
Abstract
Named Entity Recognition, Question Answering, Information Retrieval,
Machine Translation, etc. fall under the tasks that follow Natural Language
Processing approaches wherein Sentiment Analysis uses Natural Language
Processing as one of the means to find the subjective text indicating negative,
positive or neutral polarity. The united approach of text mining and natural
language processing, termed to be as Sentiment Analysis has gained huge
heights due to the increased use of social media websites like Facebook,
Instagram, Twitter to name a few. Sentiment Analysis is a growing field and
nevertheless a lot of research is done in English when compared to Arabic
language. Analysis of Sentiments helps companies, government and other
organization to improvise their products and service based on the reviews or
comments. This paper not only depicts the various challenges faced by Arabic
Natural Language processing in the Sentiment Analysis task, but this paper
presents an Innovative approach that explores the role of lexicalization for
Arabic sentiment analysis. Sentiment Analysis in Arabic is hindered due to lack
of resources, language in use with sentiment lexicons, pre-processing of dataset
as a must and major concern is repeatedly following same approaches. One of
the key solution found to resolve these problems include applying the extension
of lexicon to include more words not restricted to Modern Standard Arabic.
Secondly, avoiding pre-processing of dataset. Third, and the most important one,
is investigating the development of an Arabic Sentiment Analysis system using
a novel rule-based approach. This approach uses heuristics rules that is triggered
based on end-to-end mechanism of a particular word in a manner that accurately
classifies the tweets as positive or negative. The manner in which a series of
abstraction occurs resulting in an end to end rule-based chaining approach. For
each lexicon this chain specifically follows a chaining of rules (i.e. rule A chains
with rule B and if required rule C and so on), with appropriate positioning and
Arabic Sentiment Analysis: A Rule-based Approach
prioritization of rules. Expensive rules in terms of time and effort thus resulted
in outstanding results. Experiments were conducted on two dataset. They are
chosen for a number of good reasons, including their availability and
successfully used by other researches, richness and sufficient to come to a
conclusion, and provision with electronic resources such as lexicon. Two set of
experiments were done. The first set of experiment was done only with two rules
– “equal to” and “within the text”. The second set of experiment was done with
rule chaining mechanism. The results thus achieved with end to end rule chaining
approach achieved 93.9% accuracy when tested on one dataset, which is
considered the baseline, and 85.6% accuracy on OCA, the second dataset. A
further comparison with the baseline showed huge increase in accuracy by
23.85%.
Description
DISSERTATION WITH DISTINCTION
Keywords
named entity recognition, information retrieval, machine translation, natural language processing, Arabic sentiment analysis