Dataset built for Arabic Sentiment Analysis

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
The British University in Dubai (BUiD)
Social media administrations, for example, Facebook and Twitter and online networking facilitating sites, for example, Flickr and YouTube have turned out to be progressively famous in later a long time. One key variable to their allure worldwide is that these destinations and administrations permit individuals to express and impart their insights, likes, and hates, unreservedly and straightforwardly. The assessments posted extent from reprimanding government officials to talking about top notch cricket individuals, referring to top news, assessing motion pictures, and suggesting new items and administrations, for example, mobiles, eateries, and so on. This advancement has powered a new field known as subjective examination and opinion mining with the objective of separating individuals' notion from content to help clients in their buy choices and merchants in improving their notoriety. This rising field has pulled in a vast research interest, however the greater part of the current work concentrates on English content, with less contribution to Arabic. Arabic Sentiment Analysis focusses on datasets and lexicons, but less efforts and contribution to this hinders the success in Sentiment Arabic when we talk about Arabic. Consequently, in this proposal, we considered sentiment investigation of Arabic as the key focus and support the researchers in this field by developing a dataset from online networking website, to be specific Youtube, Twitter, Facebook, Instagram and Keek, due to wide use of these by Arabic Community to share their opinions and reviews. In particular, we contemplated reviews/tweets from Youtube, Twitter, Facebook, Instagram and Keek which convey a Sentiment. We built up a framework that will procure Arabic content from Twitter, Facebook, Instagram, Keek and concentrate clients' suppositions towards diverse points and items. Key stages of the framework takes three dimensions. We followed an Algorithm which involves Data Acquisition stage, Filtering Stage and Annotation Stage. In the Data Acquisition stage, we gathered tweets/ reviews from Facebook, Youtube, Instagram, Keek and Twitter identified with particular subjects. In the Tweet/Reviews-Filtering stage, we diminished the ones which ought to convey no sentiment, repeated reviews, spam. The gathered filtered tweets /reviews where used in the Annotation stage, wherein the filtered reviews/tweets where annotated as Positive or Negative. We tested this dataset on Siddiqui et al. 2016 system2 due to unavailability of state of art, on for testing we achieved an accuracy of 77.75%. As there is no state of art, we further evaluated our system by providing our dataset to three Arabic native speakers who further confirmed the authenticity of the dataset generated.
dataset built, Arabic sentiment analysis, Arabic Community