Investigating Cross-Lingual Hate Speech Detection on Social Media

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
The British University in Dubai (BUiD)
Social media platforms are becoming an integral part of our life. Massive amounts of content are being uploaded to social media platforms every second by online users. Social media sites are creating an exciting platform for online users to freely express their views, and share news or even thoughts and insights about any topic of their interest. Contrarily, social media platforms are becoming the ground for allowing toxic behaviour, online harassment, personal attacking and hate-speech content. This has resulted in many social media users closing their account to maintain their psychological and physical safety. Major social media platforms such as Facebook, Twitter, YouTube are taking this problem very seriously, and making huge efforts and investment to maintain the trust, safety, integrity of the users in their platforms. However, recent research studies conducted in the United States on sample of online users, indicated that over 40% have personally experienced online harassment, and almost every online user is asking major online tech companies to act against it (Pew Research, USA, 2019). With the availability of social media platforms in many languages and across different regions, Hate-speech and online harassment issues are becoming large-scale global problem that is affecting online users around the world. Therefore, there is an increasing demand to advance the current research and development in detecting online hate-speech not only for English but also for other languages. Previous research efforts have mainly focused on tackling hate-speech content for primary languages English, French and others, while very limited work has been done in other emerging languages such as Arabic where Internet penetration is exploding. In this research, we investigate the task of building techniques for detecting online hate speech in Arabic language. Our contribution in this work can be summarised into two parts, the first part is to study the challenges of detecting hate speech for noisy, user-generated informal comments and tweets in Arabic, and the second part is to investigate novel approaches to build effective techniques for tackling this problem.
hate speech detection, cross-lingual, social media