Sentiment analysis for Arabic
The British University in Dubai (BUiD)
Named Entity Recognition, Question Answering, Information Retrieval, Machine Translation, etc. fall under the tasks that follow Natural Language Processing approaches wherein Sentiment Analysis uses Natural Language Processing as one of the means to find the subjective text indicating negative, positive or neutral polarity. The united approach of text mining and natural language processing, termed to be as Sentiment Analysis has gained huge heights due to the increased use of social media websites like Facebook, Instagram, Twitter to name a few. Sentiment Analysis is a growing field and nevertheless a lot of research is done in English when compared to Arabic language. Analysis of Sentiments helps companies, government and other organization to improvise their products and service based on the reviews or comments. This paper not only depicts the various challenges faced by Arabic Natural Language processing in the Sentiment Analysis task, but this paper presents an Innovative approach that explores the role of lexicalization for Arabic sentiment analysis. Sentiment Analysis in Arabic is hindered due to lack of resources, language in use with sentiment lexicons, pre-processing of dataset as a must and major concern is repeatedly following same approaches. One of the key solution found to resolve these problems include applying the extension of lexicon to include more words not restricted to Modern Standard Arabic. Secondly, avoiding pre-processing of dataset. Third, and the most important one, is investigating the development of an Arabic Sentiment Analysis system using a novel rule-based approach. This approach uses heuristics rules that is triggered based on end-to-end mechanism of a particular word in a manner that accurately classifies the tweets as positive or negative. The manner in which a series of abstraction occurs resulting in an end to end rule-based chaining approach. For each lexicon this chain specifically follows a chaining of rules (i.e. rule A chains with rule B and if required rule C and so on), with appropriate positioning and Arabic Sentiment Analysis: A Rule-based Approach prioritization of rules. Expensive rules in terms of time and effort thus resulted in outstanding results. Experiments were conducted on two dataset. They are chosen for a number of good reasons, including their availability and successfully used by other researches, richness and sufficient to come to a conclusion, and provision with electronic resources such as lexicon. Two set of experiments were done. The first set of experiment was done only with two rules – “equal to” and “within the text”. The second set of experiment was done with rule chaining mechanism. The results thus achieved with end to end rule chaining approach achieved 93.9% accuracy when tested on one dataset, which is considered the baseline, and 85.6% accuracy on OCA, the second dataset. A further comparison with the baseline showed huge increase in accuracy by 23.85%.
DISSERTATION WITH DISTINCTION
named entity recognition, information retrieval, machine translation, natural language processing, Arabic sentiment analysis