Clustering Tweets to Discover Trending Topics about دبي (Dubai)

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
The British University in Dubai (BUiD)
Nowadays, a lot of people targeting social networks to learn what are the trending topics and the news alongside the huge flow of texts posted daily in social networks. One of these social networks is Twitter - a microblogging hub and rich environment of data. Scanning tweets online is a hard task and searching effortlessly to find intended topic from huge amount of data is also time consuming. This paper is intended to propose a solution of collecting Twitter of the corpus دبي (Dubai) by using Zapier website and storing them in Google sheet. Then, creating a word vector to the tweets by using TF-IDF methodology. After this, log results into k- mean clustering algorithm with cosine similarity to measure similarity between objects of each cluster. The results demonstrate that internal evaluation techniques failed to evaluate quality of the cluster. In addition to that, interesting topics was found about دبي (Dubai). Moreover, better results achieved by using Filter Tokens (by Region) than without using it. The data were collected for the experiment at several periods to ensure getting the most trending topics about دبي (Dubai). All of the results found in this paper tested with real tweets.
Twitter, Arabic tweets, K-mean clustering, TF-IDF, cosine similarity