Please use this identifier to cite or link to this item: https://bspace.buid.ac.ae1234/1196
Title: Clustering Tweets to Discover Trending Topics about دبي (Dubai)
Authors: ALYALYALI, SALAMA KHAMIS SALEM KHAMIS
Keywords: Twitter
Arabic tweets
K-mean clustering
TF-IDF
cosine similarity
Issue Date: Mar-2018
Publisher: The British University in Dubai (BUiD)
Abstract: Nowadays, a lot of people targeting social networks to learn what are the trending topics and the news alongside the huge flow of texts posted daily in social networks. One of these social networks is Twitter - a microblogging hub and rich environment of data. Scanning tweets online is a hard task and searching effortlessly to find intended topic from huge amount of data is also time consuming. This paper is intended to propose a solution of collecting Twitter of the corpus دبي (Dubai) by using Zapier website and storing them in Google sheet. Then, creating a word vector to the tweets by using TF-IDF methodology. After this, log results into k- mean clustering algorithm with cosine similarity to measure similarity between objects of each cluster. The results demonstrate that internal evaluation techniques failed to evaluate quality of the cluster. In addition to that, interesting topics was found about دبي (Dubai). Moreover, better results achieved by using Filter Tokens (by Region) than without using it. The data were collected for the experiment at several periods to ensure getting the most trending topics about دبي (Dubai). All of the results found in this paper tested with real tweets.
URI: http://bspace.buid.ac.ae/handle/1234/1196
Appears in Collections:Dissertations for Informatics (Knowledge and Data Management)

Files in This Item:
File Description SizeFormat 
2015228094.pdf2.26 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.