Detecting Arabic Cyberbullying Tweets in Arabic Social Using Deep Learning

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
The British University in Dubai (BUiD)
The widespread engagement with social media platforms in recent years has made cyberbullying a significant concern. Individuals may have catastrophic side effects from that as well, including despair, anxiety, and even suicide. Due to the difficulty of manually detecting and categorizing vast volumes of electronic text data, conventional methods for recognizing and combating cyberbullying have not proven successful. As a consequence, deep learning methods have become a potential solution for this situation. Artificial neural networks and other deep learning approaches can automatically identify patterns and features from a massive quantity of data. These methods may be applied to electronic text data analysis to spot cyberbullying-related trends. Techniques for natural language processing may be used to text data to extract useful features like sentiment, emotion, and subjectivity. A sizable dataset of electronic text data was gathered from multiple social media platforms like Twitter, Instagram, YouTube, and many more sites in order to examine cyberbullying in social media using machine learning and deep learning techniques. The data needs to be initially prepared so that deep learning algorithms may be trained on it before cyberbullying analysis can be done. Manually annotated data from a corpus collection was used to label the information for deep learning purposes. Pre-processing is a vital part of the data preparation process for cyberbullying detection. There are several varieties of Arabic, but the three most common are dialect Arabic , Modern Standard Arabic, and Classical Arabic. Because of its widespread use on social media, DA Arabic is the subject of this essay. Based on the existence of cyberbullying, the data was then preprocessed and classified. In this work, two cases of classification were adapted. The first case was 2-classes classification where the data labeled as either cyberbullying or not cyberbullying. The second case was 6-classes classification which consists of six different cyberbullying types. To categorize electronic text in these two cases, deep learning models such as convolutional neural networks and recurrent neural networks and a combination of CNN-RNN were trained on this data. In an independent test set, the trained models were assessed, and they showed promise in identifying cyberbullying via social media. The results that obtained from 2-classes classification showed a superiority of LSTM in terms of accuracy with 95.59%, while the best accuracy in the 6-classes classification gained from implementing CNN with 78.75%. Meanwhile the f1-score results were the highest in LSTM for the 2-lasses and 6-classes classifications with 96.73% , and 89%, respectively. These findings emphasize the potential for deep learning techniques to be applied in the development of automated systems for identifying and combating cyberbullying on social media and show how well they work in detecting cyberbullying.
technology, deep learning, Arabic cyberbullying, social media