Detecting Arabic Cyberbullying Tweets in Arabic Social Using Deep Learning الكشف عن تغريدات التنمر اإللكتروني باللغة العربية على مواقع التواصل االجتماعي العربية باستخدام التعلم العميق by FARIS KHAMIS ATIQ ALFALASI Dissertation submitted in partial fulfilment of the requirements for the degree of MSc INFORMATICS at The British University in Dubai June 2023 DECLARATION I warrant that the content of this research is the direct result of my own work and that any use made in it of published or unpublished copyright material falls within the limits permitted by international copyright conventions. I understand that a copy of my research will be deposited in the University Library for permanent retention. I hereby agree that the contents of this dissertation for which I am author and copyright holder may be copied and distributed by The British University in Dubai for the purposes of research, private study or education and that The British University in Dubai may recover from purchasers the costs incurred in such copying and distribution, where appropriate. I understand that The British University in Dubai may make a digital copy available in the institutional repository. I understand that I may apply to the University to retain the right to withhold or to restrict access to my thesis for a period which shall not normally exceed four calendar years from the congregation at which the degree is conferred, the length of the period to be specified in the application, together with the precise reasons for making that application. ___________________ Signature of the student COPYRIGHT AND INFORMATION TO USERS The author whose copyright is declared on the title page of the work has granted to the British University in Dubai the right to lend his/her research work to users of its library and as an open source and to make partial or single copies for educational and research use. The author has also granted permission to the University to keep or make a digital copy for similar use and for the purpose of preservation of the work digitally. Multiple copying of this work for scholarly purposes may be granted by either the author, the Registrar or the Dean only. Copying for financial gain shall only be allowed with the author’s express permission. Any use of this work in whole or in part shall respect the moral rights of the author to be acknowledged and to reflect in good faith and without detriment the meaning of the content, and the original authorship. ABSTRACT The widespread engagement with social media platforms in recent years has made cyberbullying a significant concern. Individuals may have catastrophic side effects from that as well, including despair, anxiety, and even suicide. Due to the difficulty of manually detecting and categorizing vast volumes of electronic text data, conventional methods for recognizing and combating cyberbullying have not proven successful. As a consequence, deep learning methods have become a potential solution for this situation. Artificial neural networks and other deep learning approaches can automatically identify patterns and features from a massive quantity of data. These methods may be applied to electronic text data analysis to spot cyberbullying-related trends. Techniques for natural language processing may be used to text data to extract useful features like sentiment, emotion, and subjectivity. A sizable dataset of electronic text data was gathered from multiple social media platforms like Twitter, Instagram, YouTube, and many more sites in order to examine cyberbullying in social media using machine learning and deep learning techniques. The data needs to be initially prepared so that deep learning algorithms may be trained on it before cyberbullying analysis can be done. Manually annotated data from a corpus collection was used to label the information for deep learning purposes. Pre-processing is a vital part of the data preparation process for cyberbullying detection. There are several varieties of Arabic, but the three most common are dialect Arabic, Modern Standard Arabic, and Classical Arabic. Because of its widespread use on social media, DA Arabic is the subject of this essay. Based on the existence of cyberbullying, the data was then preprocessed and classified. In this work, two cases of classification were adapted. The first case was 2-classes classification where the data labeled as either cyberbullying or not cyberbullying. The second case was 6-classes classification which consists of six different cyberbullying types. To categorize electronic text in these two cases, deep learning models such as convolutional neural networks and recurrent neural networks and a combination of CNN-RNN were trained on this data. In an independent test set, the trained models were assessed, and they showed promise in identifying cyberbullying via social media. The results that obtained from 2-classes classification showed a superiority of LSTM in terms of accuracy with 95.59%, while the best accuracy in the 6-classes classification gained from implementing CNN with 78.75%. Meanwhile the f1-score results were the highest in LSTM for the 2- lasses and 6-classes classifications with 96.73%, and 89%, respectively. These findings emphasize the potential for deep learning techniques to be applied in the development of automated systems for identifying and combating cyberbullying on social media and show how well they work in detecting cyberbullying. ABSTRACT (in Arabic) أدى التفاعل الواسع النطاق مع منصات وسائل التواصل االجتماعي في السنوات األخيرة إلى جعل التنمر عبر اإلنترنت مصدر قلق كبير. قد يكون لألفراد آثار جانبية كارثية من ذلك أيًضا ، بما في ذلك اليأس والقلق وحتى االنتحار. نظًرا لصعوبة اكتشاف وتصنيف كميات هائلة من البيانات النصية اإللكترونية يدويًا ، فإن الطرق التقليدية للتعرف على التنمر اإللكتروني ومكافحته لم تثبت نجاحها. نتيجة لذلك ، أصبحت أساليب التعلم العميق حالً محتمال ً لهذا الموقف. يمكن للشبكات العصبية االصطناعية وغيرها من مناهج التعلم العميق تحديد األنماط والميزات تلقائيًا من كمية هائلة من البيانات. يمكن تطبيق هذه األساليب على تحليل البيانات النصية اإللكترونية الكتشاف االتجاهات المتعلقة بالتسلط عبر اإلنترنت. يمكن استخدام تقنيات معالجة اللغة الطبيعية لكتابة البيانات النصية الستخراج ميزات مفيدة مثل المشاعر والعاطفة والذاتية. Twitter تم جمع مجموعة بيانات كبيرة من البيانات النصية اإللكترونية من العديد من منصات الوسائط االجتماعية مثل و Instagram و YouTube والعديد من المواقع األخرى من أجل فحص التنمر عبر اإلنترنت في وسائل التواصل االجتماعي باستخدام التعلم اآللي وتقنيات التعلم العميق .يجب إعداد البيانات مبدئيًا بحيث يمكن تدريب خوارزميات التعلم ًيا من مجموعة المدونات لتسمية العميق عليها قبل إجراء تحليل التنمر اإللكتروني. تم استخدام البيانات المشروحة يدو المعلومات ألغراض التعلم العميق. تعد المعالجة المسبقة جزًءا حيويًا من عملية إعداد البيانات الكتشاف التسلط عبر اإلنترنت. هناك عدة أنواع من اللغة العربية ، ولكن األنواع الثالثة األكثر شيوًعا هي اللهجة العربية والعربية الفصحى الحديثة والعربية الفصحى. نظًرا الستخدامها على نطاق واسع على وسائل التواصل االجتماعي ، فإن DA Arabic هي موضوع هذا المقال. بناًء على وجود التسلط عبر اإلنترنت ، تمت معالجة البيانات وتصنيفها مسبقًا. في هذا العمل ، تم تكييف حالتين من التصنيف. كانت الحالة األولى عبارة عن تصنيف من فئتين حيث تم تصنيف البيانات على أنها إما تسلط عبر اإلنترنت أو ليس تسلًطا عبر اإلنترنت. كانت الحالة الثانية عبارة عن تصنيف من 6 فئات تتكون من ستة أنواع مختلفة من التسلط عبر اإلنترنت. لتصنيف النص اإللكتروني في هاتين الحالتين ، تم تدريب نماذج التعلم العميق مثل الشبكات العصبية التالفيفية والشبكات العصبية المتكررة ومجموعة من CNN-RNN على هذه البيانات. في مجموعة اختبار مستقلة ، تم تقييم النماذج المدربة ، وأظهرت نتائج واعدة في تحديد التنمر عبر اإلنترنت عبر وسائل التواصل االجتماعي. أظهرت النتائج التي تم الحصول عليها من تصنيف فئتين تفوق LSTM من حيث الدقة بنسبة ٪95.59 ، بينما حصلت أفضل دقة في تصنيف 6 فئات من تطبيق CNN بنسبة 78.75٪. وفي الوقت نفسه ، كانت نتائج F1 هي األعلى في LSTM لتصنيفي lasses-2 و class-6 بنسبة 96.73٪ و 89٪ على التوالي. تؤكد هذه النتائج على إمكانية تطبيق تقنيات التعلم العميق في تطوير األنظمة اآللية لتحديد ومكافحة التسلط عبر اإلنترنت على وسائل التواصل االجتماعي وإظهار مدى نجاحها في اكتشاف التنمر اإللكتروني. ACKNOWLEDGEMENTS I would like to acknowledge the whole community at British University in Dubai (BUiD) for providing me the opportunity to explore and major in informatics, which has given me a wealth of information. I also want to thank Prof. Manar Alkhatib for supervising this dissertation and creating it possible for me to complete this work. I also like to thank Dr. Khaled Shaalan for his assistance and counsel during this study. It would have been tough to generate a research paper of this caliber without their support. Lastly, I would want to express my gratitude to my family, particularly to my sisters, and parents, who all encouraged and supported me in pursuing my education. TABLE OF CONTENTS 1. CHAPTER ONE: Introduction 1 1.1 Problem Definition 2 1.2 Research Objectives 3 1.3 Thesis Contribution 3 1.4 Dissertation Organization 3 2. CHAPTER TWO: Related Work 5 2.1 Arabic Sentiment Analysis Using Machine Learning and Neural Network 5 2.2 Arabic Cyberbullying Using Machine Learning and Neural Network 8 2.3 English Cyberbullying Using Machine Learning and Neural Network 10 3. CHAPTER THREE: Literature review 12 3.1 Background 12 3.1.1 Cyberbullying Definition 12 3.1.2 Cyberbullying on Social Media 13 3.1.3 Cyberbullying Forms 13 3.1.4 Sentiment Analysis 15 3.1.5 Emotion 15 3.1.6 Arabic Language 16 3.1.7 Arabic Text Cyberbullying Types 18 3.2 Deep Learning Approaches 20 3.2.1 Convolutional neural networks 20 3.2.2 Recurrent Neural Networks (RNN) 23 3.2.3 Long Short Term Memory Networks (LSTM) 25 3.2.4 Hybrid CNN-RNN 26 3.2.5 A comparison between deep learning models 27 3.3 Data Pre-Processing 29 3.3.1 Data Cleaning 29 3.3.2 Vectorization and Word Embedding 31 3.3.3 Continuous Bag-of-Words Model (CBOW) 32 3.3.4 Skip-Gram (SG) model 33 3.3.5 Feature Extraction 34 3.3.6 Annotation 35 3.3.7 Training Data And Test Data 36 3.3.8 Metrics Evaluation 37 4 CHAPTER FOUR: Dataset Corpus 39 4.1 Data Collection 39 4.2 Two-Class Classification 40 i 4.3 Six-Class Classification 40 5 CHAPTER FIVE: Methodology 42 5.1 Data Preprocessing Methodology 43 5.1.2 Data Preparation 43 5.1.1 Collecting Data 44 5.1.3 Training dataset and test data 44 5.2 Feature Extraction 47 5.3 Deep learning Models 47 5.4 Evaluation metrics 47 6 CHAPTER SIX: Evaluation and Results 50 6.1 Experiment Setup 50 6.1.1 Data Preprocessing 50 6.1.2 Deep Learning Neural Network Models 50 6.1.2.1 Feature Representation 50 6.1.2.2 CNN Architecture 51 6.1.2.3 Evaluation metrics 53 6.1.2.4 LSTM Architecture 53 6.2 Experiments Results 54 6.2.1 Results and Discussion 54 7 CHAPTER SEVEN: Conclusion And Future Work 61 References 63 ii LIST OF TABLES Table 1: Cyberbullying Types and examples ......................................................................20 Table 2 : Characteristics Of The Two-Class Classification. ...............................................40 Table 3: An Example Of The Collected Data For The First Dataset ..................................40 Table 4: Characteristics of the Six-Class Classification .....................................................41 Table 5: An Example Of The Collected Data For The Six-Class Classification ................41 Table 6: Training Data and Test Data .................................................................................45 Table 7: Training Data And Test Data In Two-Class and Six-Class Classifications ..........45 Table 8: Summarize The Hyper-Parameters Of The CNN Model ......................................52 Table 9: hyper-parameters for the LSTM model .................................................................53 Table 10: Hyper-parameters for CNN-LSTM .....................................................................53 Table 11: performance for the 2-classes classification .......................................................55 Table 12: Results Of CNN On The 6-Classes Classification ..............................................56 Table 13: Performance Of LSTM On 6 Classes Classification...........................................57 Table 14: Performance Of The CNN-LSTM on the 6 Classes Classification .....................59 iii LIST OF FIGURES Figure 1: Cyberbullying Forms ............................................................................................14 Figure 2: CNN Convolutional Layer ....................................................................................21 Figure 3: CNN Pooling Layer ..............................................................................................22 Figure 4: CNN Architecture .................................................................................................22 Figure 5: Visualizing in CNN ..............................................................................................23 Figure 6: CNN Architecture .................................................................................................23 Figure 7: Basic architecture of Recurrent Neural Networks ................................................25 Figure 8: LSTM Single Cell .................................................................................................26 Figure 9:CNN-LSTM Architecture ......................................................................................27 Figure 10:Data Preprocessing and Cleaning ........................................................................30 Figure 11: Raw data crawled from Tweeter .........................................................................39 Figure 12: Methodology Structure .......................................................................................43 Figure 13: 2- Classes Classification and 6-clsses classification Datasets Related Statistics ..............................................................................................................................................45 Figure 14: the data set for the 2-Class Classification ...........................................................46 Figure 15: the data set for the 6-Class Classification ...........................................................46 Figure 16: Training data categories for the 6-Class Classification ......................................46 Figure 17: Test Data categories in 6-Classes Classification ................................................46 Figure 18: dataset categories for the 6-Class Classification ................................................46 Figure 19: LSTM Architecture .............................................................................................49 Figure 20:CNN Architecture ................................................................................................49 Figure 21: CNN-LSTM Architecture ...................................................................................49 Figure 22: Convolutional neural network (CNN) architecture. ...........................................52 Figure 23: Results of 2 classes classification .......................................................................54 Figure 24: Accuracy on 2 Classes Classification .................................................................55 Figure 25: Results of implementing CNN on 6 classes classification .................................57 Figure 26: Results of LSTM on 6 Classes Classification .....................................................58 Figure 27:Results of CNN-LSTM on 6 Classes Classification ............................................58 Figure 28: the Accuracy in 6 Classes Classification ............................................................59 Figure 29: Accuracy in both 2 Classes and 6- Classes classifications .................................60 iv LIST OF ABBREVIATIONS Term Abbreviations ACB Animal Cyberbullying CA Classic Arabic CBOW Continuous Bag-of-Words Model CNN Convolutional Neural Network DA Dialect Arabic DT Decision Tree DL Deep Learning KNN K-Nearest Neighbor LR Logistic Regression LCB Looking Cyberbullying LSTM LSTM ML Machine Learning MLP multilayer perceptron MSA Modern Standard Arabic NB Naive Bayes NCB Not Cyberbullying PSCB Psychological Cyberbullying RF Random Forest RCB Religious Cyberbullying RNN Recurrent Neural Networks SA Sentiment Analysis SCB Sexual Cyberbullying SG Skip-Gram Model SVM Support Vector Machines v 1. CHAPTER ONE: INTRODUCTION Nowadays, the Advances in technology have greatly improved our ability to access information, communicate with others, and complete tasks quickly and efficiently. However, there are also concerns about the negative impacts of technology, such as its effects on cyber- attacks and threats to privacy, security, and social interactions. Statistics declared that about 18% of kids in Europe were suffered either by people bullying or effected via harassing them by the Internet connections and mobile communication. Twitter is a common interacting social networking platform that permits users to share their information by tweets or comments. All the People around the world now are technologically communicated, regardless of timing, the location or the distance between them. Social networks platforms are trending nowadays, and the majority of the people, particularly the teenagers, are passionate to join and interconnect with the online relationships. The anonymity of social media, where users usually utilize fake names rather than their real names, has resulted in a massive number of cybercrimes. There are several types of online crime, for instance, cyberbullying which cause suffering for the bullied. Cyberbullying is considered a severe ethical affair on the internet platforms, and the number of bullied persons who have been cyberbullying victims, mostly teenagers, is annoying. Cyberbullying refers to the use of technology, such as social media, to harass, intimidate, or otherwise harm others. This can include actions such as posting hurtful comments or messages, sharing embarrassing pictures or videos, or creating fake profiles to mock or harass someone. Cyberbullying can be particularly harmful because it can happen 24/7 and can be difficult for the victim to escape from, as social media is often a constant presence in their lives. Additionally, the anonymity and distance provided by the internet can make it easier for bullies to act aggressively and without consequence. In order to estimate cyberbullying propagation, various research papers have discussed cyberbullying, and the results proved that cyberbullying is a common issue between the youth nowadays, with an increasing number of victims [1]. Various techniques of cyberbullying detection have been structured to monitor and prevent cyberbullying. Studies have increased in the domain of cyberbullying detection. In spite of the propagation and the negative impact within the Arabic culture, some researchers have 1 studied this form of offensive in the Arabic language [2]. The cyberbullying on the Arabic platforms is yet restricted and is considered a problematic work due to several causes: First, the Arabic spoken language has extremely complicated morphological entities. Second, the majority of the Arabic people speak colloquial Arabic rather than modern standard Arabic (MSA). Actually, there are various words that are not permitted in the Arabic culture, meanwhile, they are completely agreeable in other societies [3]. For instance, the expression “dog” "كلب" or the word “حمار” “donkey” are examples of farm animals. Nevertheless, it is not allowed to utilize these forms of expressions in different contexts, for example describing people or actions. Nowadays, the cyberbullying automatic detection has led to significant development in cyberbullying classifications, particularly in English language, like in [4] and [5]. However, few studies have been done utilizing deep learning for Arabic cyberbullying detection on social media thus, we aim in this study to enhance the accuracy of the Arabic cyberbullying detection and improve the performance of the system. 1.1 Problem Definition Cyberbullying detection is one of the essential domains that is widely extended due to the increase in online communications. The definition of cyberbullying is “willful and repeated harm inflicted through the use of computers, cell phones, and other electronic devices”, is to identify the reviewer’s feelings toward a topic using textual context. The goal of Cyberbullying detection is to identify the cyberbullying on social media using the Arabic language in the textual context. The problem statement can be depicted as the detecting reviews polarity to be classified as negative, positive, or neutral and specify the negative polarity as cyberbullying. The reality that the Arabic language has not been considered as much as the English language is a wake-up call for all Arabic scientific researchers. Nevertheless, in the past few years, we can notice a remarkable effort done in Arabic cyberbullying detection resources. The Arabic language has much more challenges than foreign languages due to its complex morphology and the presence of different dialects. This study additionally examines how significant it is to interest from deep learning for detecting cyberbullying on a large scale such as in social media using Arabic language. The research challenge is evaluated by contrasting the performance of many deep learning classifiers with a number of datasets. The outcomes will assist us recognize how helpful deep learning models are in classifying Arabic text and detecting cyberbullying in social media. 2 1.2 Research Objectives The main goal of this research is to seek for answering to the following challenges ● To classify the Arabic tweets into two categories cyberbullying and Non- cyberbullying. ● To generate a table of “Arabic cyberbullying” expressions from the collected data. ● To apply deep learning classifiers on the collected dataset and compare the empirical results with conventional learning techniques results obtained from previous research. ● To generate our model by utilizing Python. The research aims to answer the following questions: RQ1 How can cyberbullying and online abuse be detected and prevented on Arabic social media platforms. RQ2 What are the issues and how to solve them for the processing of the Arabic tweets? RQ3 What is the best performing model for cyberbullying detection? 1.3 Thesis Contribution The dissertation’s objectives are to fill the research gap for cyberbullying detection in Arabic social media and focusing on the dialect Arabic. It shows and examines various deep learning calcification models for Arabic social media cyberbullying reviews. We also seek to enhance these DL models to gain superior results for the Arabic language. We supply a new dataset for cyberbullying detection for Arabic social media. 1.4 Dissertation Organization The dissertation is organized as follows: ● Chapter 1: introduction: presents an overview of the cyberbullying ● Chapter 2: Related work: proposes a summary of the studies that has been deducted in this domain. ● Chapter 3: Background: review of cyberbullying and the fundamental concepts engaged. It investigated several cyberbullying technique designs, the importance of cyberbullying detection, and its challenges. ● Chapter 4: Dataset Corpus: collect and analyze the datasets that are utilized in this thesis. 3 ● Chapter 5: Methodology: illustrates the data collection methodology. ● Chapter 6: Evaluation and Results: shows the experimental results that we gained by implementing deep neural network techniques which are CNN, RNN and the combined CNN-RNN and make a comparison between them. ● Chapter 7: Conclusion And Future Work: it concludes the dissertation and supply potential orientation for future research. 4 2. CHAPTER TWO: RELATED WORK This section presents a literature review of various bullying classifications and sentiment analysis that researchers proposed in their works. The majority of the previous works deducted in the cyberbullying and sentiment analysis can be categorized into three groups: machine learning, deep learning and sentiment orientation. The previous works of cyberbullying and sentiment analysis mostly utilized machine learning techniques, for instance, Naïve Bayes (NB), Decision Tree and Support Vector Machines (SVM). Scholars tended to neural networks for cyberbullying detection by utilizing CNN, LSTM or RNN. 2.1 Arabic Sentiment Analysis Using Machine Learning and Neural Network Nowadays, various researches have been used in Machine learning and deep learning in the field of NLP, involving sentiment analysis in several languages. Despite all the research performed on English sentiment analysis, few studies have been conducted on Arabic social media. It is obvious that analyzing such huge data to enhance products, services, and meet customer needs is quite difficult [6]. Thus, employing cutting-edge sentiment analysis tools, the evaluation process must be automated. The majority of earlier efforts concentrated on text-based, mono-modal sentiment analysis. The most recent support vector machine (SVM)-based classification approach is described and validated in this research utilizing a fresh Arabic multimodal dataset. The authors of this study considered SVM as a classifier without considering any other classifiers to validate the results. The purpose of [7] study is to enhance the sentiment analysis's labeling procedure. There were two strategies used. In order to build a framework of trustworthy Twitter messages with positive, negative, or neutral feelings, a neutral class was first added. The second strategy involved relabeling in order to streamline the labeling procedure. The labeling procedure in this study only applied to seven positive or negative random features: "profits'' (ارباح), "losses" (خسائر), "green color" (باللون االخضر), "growing" (زيادة), "distribution" Twenty tweets out .(تاجيل) "and "delay ,(غرامة) "financial penalty" ,(انخفاض) "decline" ,(توزيع) of the 48 that were recorded and evaluated had their labels changed, which decreased the categorization error by 1.34%. The researchers in the previously reported study had limitations and restrictions in the size of the data. 5 The "OMAM" systems that we created for SemEval-2017 challenge 4 are discussed in [8]. For subtask A, the authors assessed English cutting-edge techniques using tweets in Arabic. With regard to the remaining subtasks, researchers presented a topic-based methodology that takes into account subject specificities by anticipating the topics or areas of incoming tweets and using this knowledge to forecast their emotion. The results show that translating the English state-of-the-art technique into Arabic has produced good outcomes without major improvements. The topic-based approach ranked the first place in subtasks C and E and second in subtask D. the results showed accuracy of 41% which can be improved more in addition to the fact that the researchers implemented single type of classification which is LSTM without considering any comparison between other types of classifications for results evaluations. investigation of Arabic Twitter characterization in [9] intended to enhance knowledge of the subject while also facilitating a better understanding of Arabic Tweets. Also, studies examined the effectiveness of the two schools of machine learning, namely the feature engineering technique and the deep learning methods, on Arabic Twitter. Researchers took into account models that have attained state-of-the-art performance for English opinion mining. Findings demonstrate the benefits of deep learning-based methods and underline the necessity of employing morphological abstractions to deal with the intricate morphology of Arabic. The results obtained 58% accuracy which can be enhanced more. Convolutional neural networks (CNNs) for ASA are examined in [10] using a system we developed called CNN-ASAWR on the ASTD and SemEval 2017 datasets. We investigate the impact of different word representations that are unsupervised and acquired from unlabeled corpora. Without utilizing any sort of hand engineering features, experimental findings showed that we were able to surpass the prior state-of-the-art methods on the datasets. The study focused on CNN classifiers without considering other forms of classification to validate the results. The tweets in [11] study's Arabic-language Twitter corpus from Jordan are labeled as being either positive or negative. It looks at various supervised machine learning sentiment analysis methods as they are implemented to social media posts made by Arabic users on topics of general interest in either Modern Standard Arabic (MSA) or Jordanian dialect. A variety of weight schemes, stemming, and N-grams terms approaches, as well as 6 other circumstances, are tested in experiments. The experimental findings show the best case for each classifier, and they show that the SVM classifier, which uses the term frequency- inverse document frequency (TF-IDF) weighting scheme and stemming through Bigram's feature, outperforms the Naive Bayesian classifier. Also, the outcomes of this study fared better than those of similar previous studies. The authors in this study were restricted with limited size of data. In the research [12], the authors provided a set of characteristics that were combined with a sentiment analysis method constructed using machine learning by utilizing Complement Naïve Bayes (CNB), and used on datasets from social media in Egyptian, Levantine, Saudi, and MSA Arabic. An Arabic Emotion Lexicon was used to determine many of the suggested traits. Additionally, the model includes emoticon-based features and input text-related information, such as the number of text segments, the length of the text, whether or not the text finishes with a question mark, etc. among six of the seven datasets the researchers tested on and which are all benchmarked, the authors demonstrate that the offered features have improved accuracy. The researchers of this investigation applied (CNB) solely without considering any other models for evaluation purposes. In [13], the authors introduced Hotel Arabic-Reviews Dataset (HARD), the largest Arabic book reviews dataset for machine translation and sentiment analysis. 490587 hotel reviews from the Booking.com website are included in HARD. Each entry includes the Arabic-language review text, the reviewer's star rating (from 1 to 10) and other details about the hotel/reviewer. They provided both the entire unbalanced dataset and a portion that is balanced. The researchers built six well-known classifiers utilizing Modern Standard Arabic (MSA) and Dialectal Arabic to analyze the datasets (DA). Researchers evaluated the polarity and rating classifications of the sentiment analyzers. The authors restricted themselves to a limited set of domains. Despite that, the research work appeared by the unsupervised method is restricted by the data size and the area of the lexicon which restricts the generalizations of the machine method, especially when implementing deep learning methods. Additionally, the works demand knowledge and expertise in this area. Rather than that, the fact of utilizing a sole single classification model without considering a comparison with other models, affects the outcomes results. 7 2.2 Arabic Cyberbullying Using Machine Learning and Neural Network Because researchers encountered numerous challenges in detecting cyberbullying in the Arabic context, machine learning techniques are utilized to automatically detect the cyberbullying. Furthermore, it assists the agencies in the government in quickly resolving issues and creating a secure and safe virtual world environment. In [2], the authors proposed the Naïve Bayes (NB) classifier algorithm for detecting cyberbullying on Arabic platforms. They trained and tested the classifier on a real data set that gathered from Twitter and YouTube. Their Arabic corpus included 25,000 comments that had been manually labeled as either bullying or non-bullying. They implemented the NB classifier for detecting CB, where the obtained accuracy rate was 95.9%. The researchers obtained the results with only a single model without comparing the results with other models. [14] utilized a feedforward neural network for Arabic cyberbullying detection, where tweets used as the dataset. The researchers altered several parameters in the neural network to make changes and gain better accuracy. The parameters consist of the hidden layers numbers, the epoch’s numbers, and the size of the batch. After various training experiments, the researchers have achieved better performance after a few epochs. 16 optimum batch size and the 7 hidden layers were given the best results with 94.56% accuracy. From the results, the authors didn’t compare the results with other models for validation. [15] This strategy is built on data mining techniques that are implemented to a dataset collected from comments on Arabic Facebook postings, and it also presented an algorithm to gauge the cyberbullying severity in a comment. The results obtained an accuracy of 77% in cyberbullying detection by applying Support Vector Machine classifier compared with Adaptive Boosting technique which acquired the highest Precision rate of 94%. The research achieved the objective target but with limitation and restriction in the size of the dataset beside that they focused only on the Syrian Slang without considering any other dialects. Several classifiers were implemented in this research, SVM model and a radial function kernel, where Lexical features alongside pre-trained embedding were studied in [16]. The results obtained from SVM were the best with 88.6%. The authors studied the SVM sole without comparing results with other classification methods for the evaluation purposes. 8 With the intention of serving as a benchmark dataset for the automatic detection of Tunisian hazardous contents on social media, [17] offered the Tunisian Hate and Abusive Speech (T-HSAB) dataset. We give a thorough analysis of the data gathering procedures and how we create the annotation standards to ensure accurate dataset annotation. The annotations consistency was later highlighted by a thorough study of the annotations, which used the metrics of Cohen's Kappa (k) and Krippendorff's alpha (α) for annotation agreement. Similarly, [18] presented research aimed at finding offensive words on Arabic social media. With the use of typical patterns seen in harsh and disrespectful interactions, researchers extracted a list of profane terms and hashtags. Also, they categorized Twitter users based on whether or not they utilize any of these words in their tweets. Using this classification, authors increased the list of profane words, and presented the findings using a newly constructed dataset comprising categorized Arabic tweets to (obscene, offensive, and clean). The researchers in the previously mentioned studies, considered the difference in Arabic dialects and its impact on detecting offensive text, which limited their focus to a single dialect solely. [19] generated a sizable Arabic dataset for obscene and offensive language. In order to understand how abusive language is used by Arabic speakers and what themes, dialects, and gender are most frequently connected with offensive tweets, researchers analyze datasets. To get strong findings (F1 = 79.7) on the dataset, they conduct numerous tests utilizing Support Vector Machine algorithms. The authors applied only one classifier which is SVM, where the results are not compared with other classifiers to validate the proposed method. For sub-task A of the Multilingual Offensive Language Identification shared task (OffensEval 2020), a component of the SemEval 2020, we discuss in this study how we plan to use pre-trained BERT models with convolutional neural networks. We demonstrate that CNN and BERT work better together than alone, and we underline the value of employing pre-trained language models for downstream tasks. Our system placed fourth in Arabic with a macro averaged F1-Score of 0.897, fourth in Greek with a score of 0.843, and third in Turkish with a score of 0.814. We also share with the community ArabicBERT, a set of pre- trained transformer language models for Arabic. For sub-task A of the Multilingual Offensive Language Identification shared task (OffensEval 2020), a component of the SemEval 2020, [20] discussed in this study how to 9 use pre-trained BERT models with convolutional neural networks. Scholars demonstrated that CNN and BERT work better together than alone, and they underlined the value of employing pre-trained language models for downstream tasks. The proposed system placed fourth in Arabic with a macro averaged F1-Score of 0.897, fourth in Greek with a score of 0.843, and third in Turkish with a score of 0.814. They also shared with the community ArabicBERT, a set of pre-trained transformer language models for Arabic. The authors restricted and limited the proposed study with the dataset size. 2.3 English Cyberbullying Using Machine Learning and Neural Network The research paper [4] implemented machine learning using SVM and (NB) classifiers to detect and prevent bullied comments and tweets. The data was collected from Twitter and they gained 71.25% as a highest accuracy which can be enhanced more. Several classifiers were implemented in this research, SVM model and a radial function kernel, where Lexical features alongside pre-trained embedding were studied in [16]. The results obtained from SVM were the best with 88.6%, which can be improved more. A hierarchical attention network that considers these elements of cyberbullying in order to detect it was suggested in [21]. The main distinguishing features of our method include I a hierarchical structure that mimics the social media session structure.; (ii) stages of algorithms used at the phrase and post level, allowing the model to pay varied levels of focus to words and posts, based on the scenario; and (iii) a cyberbullying recognition mission that additionally predicts the duration between two adjacent remarks. The suggested design beats the state-of-the-art approach, according to tests on a dataset collected from Instagram, the social networking site where the greatest number of individuals have suffered extreme cyberbullying. The authors of the previously mentioned studies didn’t determine which kind of cyberbullying they are identifying. [22] demonstrated how DNN models may be utilized for cyberbullying detection on a variety of topics across numerous social media platforms (SMPs). For all three datasets, these models outperformed the state-of-the-art results when combined with transfer learning. Authors conducted the experiments with a limited size of data. 10 The ability to categorize a tweet as racist, sexist, or neither is how [23] described this assignment. This assignment is extremely difficult due to the intricacy of the natural language constructs. To manage this complexity, the authors conducted extensive experiments with a variety of deep learning systems. The tests on a dataset consisting of 16K labeled tweets demonstrate that these deep learning techniques perform about 18 F1 points better than the most advanced char/word n-gram techniques. The obtained high F1 scores are because of the over-fitting [24]. In order to identify whether or not the text contains hate speech, [25] suggest employing a deep learning technique with the Recurrent Neural Network (RNN). There are 652 records classed as hatred in the 1235-record total twitter dataset that employs it. The authors focused on a single type of classification which is (RNN) to detect the hate speech without considering comparing other classification techniques which lower the evaluation of the proposed method. In [26] research aims to investigate how Natural Language Processing can be used to identify hate speech. Also, this research uses a dataset to apply a recent method in this area. Convolutional Neural Networks (CNN) are a type of deep learning algorithm that have been proposed because they outperform traditional methods for text classification issues. Each tweet is categorized by this classifier into one of the three categories in the Twitter dataset: hate, offensive language, or neither. This model's effectiveness has been evaluated by measuring its accuracy as well as its precision, recall, and F-score. The final model has a 91% accuracy, 91% precision, 90% recall, and a 90% F-measure. The authors concentrated their study on CNN for the purpose of hate speech recognition without taking into account the comparison with other techniques which would decrease the evaluation of the presented model. In conclusion, after reviewing the previously mentioned approach, we can clarify that our research will resolve the majority of the restrictions of the previous and recent researches in the cyberbullying detection of Arabic social media. 11 3. CHAPTER THREE: LITERATURE REVIEW 3.1 Background In the recent days, the global has seen various revolutions that were made doable by Social Media. It is a highly effective innovation of our lives, and is an impressive way to increase the borders of the person’s experiences and become active on the social. Though, social media is a two-sided arm. Several anti-social attitudes, in other words bullying, are spotted on social media. Bullying has recently been part of rising up. Furthermore, this is not restricted to children or youth; anyone can be a bullying victim. 3.1.1 Cyberbullying Definition Cyberbullying is officially definite as “willful and repeated harm inflicted through the use of computers, cell phones, and other electronic devices” [27]. Another definition is “an aggressive, intentional act carried out by a group or individual, using electronic forms of contact, repeatedly and over time against a victim who cannot easily defend him or herself” [28]. In brief, bullies normally avail the use of technical communication for harming people. This hurt may be inspired by annoyance, prevention, payback, or from a basic wish to manipulate others or to sense more power [29]. Occasionally, children cyberbully others to fit with their own low self-confidence and/or to cope with their analogues [29]. There are many types of bullying. When any form of bullying is published, it is very hard to retrieve these posts back off from the social media websites. This can happen any time of the day, even day or night time for the whole week. This can affect the victims whenever they are, by themselves, in public areas, at school, or even in the sports yards [27]. Cyberbullying authorizes a bully to embarrass and offend the victim in online societies without ever being predictable. Moreover, punishment fear or being a social castaway prevents victims and bystanders from reporting the incidents. This becomes a tough problem to manage. Cyberbullying attitude is not only inadmissible, but can also lead to tragic consequences. The researchers in [30] show that “critical impacts occurred in almost all of the respondents’ cases in the form of lower self-esteem, loneliness and disillusionment and distrust of people: The more extreme impacts were self-harm and increased aggression towards friends and family”. They also mentioned that many victims progressed “coping strategies.” Several times, victims attempted to cope with cyberbullying all by themselves, which led to a tense case. Moreover, it is difficult for victims’ parents to realize what is going on with their kid 12 online. For support systems to be able to assist the victim, they have to recognize the cyberbullying or any indicator of it at the initial. They should not anticipate the victim to inform them regarding cyberbullying. This demands for automated cyberbullying detector programs that could warn family’s members about cyberbullying. 3.1.2 Cyberbullying on Social Media Compared to spam detection mails, where several recipients receive the same broadcasted message, detecting cyberbullying, which is more individualized and context directed, is much harder. Most cyberbullying has been determined to concentrate on specific subjects consisting; racism and ethnicity, sexual identity and sexuality, shape of the body and look, intellect, and sociable incorporation and rejection. Recognizing whether a textual expression and its sentiment can cope with these topics, and the tone is positive or negative, would be essential in detecting probable cyberbullying comments [31] [32]. 3.1.3 Cyberbullying Forms This part specially concentrates on the features of cyberbullying detection in social media. It comprises user personality’s features, sentiment features, emotional features and Twitter-founded features. There are various forms of cyberbullying reported in the social media as following [33]: 1- Exclusion: exclude or eliminate somebody from a group in social media. This can cause a victim feeling dejected and secluded. 2- Harassment: sending frequent harmful and insulting online messages to a victim where messages may include threats. 3- Cyberstalking: sending fake accusations and threats to the victim about him and his loved ones which frightening him. 4- Outing: posting private and personal information about the victims in the public on social media without their permission. 5- Frapping: when someone uses your account on social media and claims they are the account’s owner to post improper materials. In this case, the victim is strapped to online content that can destroy his reputation. 13 6- Trolling: posting contentious comments on social media purposely to upset others on social media to hurt others. 7- Dissing: Posting private information or photos about another person with the intent to damage their reputation or fame. 8- Flaming: Initiating an online fight with the victim. 9- Denigration: Publishing false gossip about someone else. 10- Trickery: tricking people to the trust point so they can give out secrets, thus getting humiliated or getting made fun of. 11- Masquerade: the bully pretends to be someone who is not. Cyberbullies can set up fake online profiles on behalf of victims. They can use these profiles to publish false content in their victims’ names without the victims’ consent. 12-Catfishing: stealing somebody’s identity online and creating a false profile to fool others. The majority of the studied literature does not identify which form of cyberbullying they are detecting in their studies. Nonetheless, the most public form of cyberbullying is online harassment as in the literature [34] [35] [21]. There are sub-forms of harassment declared in the revised literature such as Aggression [36], [37] and Toxicity [38]. Figure 1 illustrates the cyberbullying forms. Exclusion Harassment Cyberstalking Outing Frapping Cyberbullying Trolling Forms Dissing Flaming Denigration Trickery Masquerade Catfishing Figure 1: Cyberbullying Forms 14 3.1.4 Sentiment Analysis Sentiment analysis, also known as opinion mining, is the use of natural language processing, text analysis, and computational linguistics to identify and extract subjective information from source materials. It is used to determine the attitudes, opinions, and emotions of a speaker or writer with respect to some topic or the overall contextual polarity of a document. Sentiment analysis can be used to monitor public opinion on social media platforms, analyze customer feedback, and even assist in political campaigns. It is an important tool for businesses and organizations to understand how their products and services are perceived by the public. Sentiments are represented by the ideologies or concepts excited by the sensations fastened with something, usually classified as positive, neutral or negative [39]. It is obtained from the demonstrated opinion in a document, sentiments or text subjectivity on social media. The method in which the unformed data are computationally treated, is addressed as sentiment analysis, and can be classified into machine learning method, lexicon-based technique and hybrid approach [40]. Machine learning method implements algorithms such as SVM, Naïve Bayes, Decision Tree, etc., while the lexicon-based technique is based on sentiment lexicons (i.e. dictionary of opinion expression and phrases with the designated polarities and severity) for measuring a text sentiment. In the situations of cyberbullying, sentiment has been utilized to differentiate between no bullies, victims and bullies [9]–[11]. For illustration, [41] specified victims of cyberbullying utilizing their sentiment scores as victims frequently face negative feelings or emotions such as loneliness, anxiety and depression. 3.1.5 Emotion Contrasting sentiment analysis, emotion analysis detects forms of sensations via the text expression, such as outrage, disgust, panic, happiness, sorrow, and surprise. Three popular techniques occur in textual emotion discovering, known as, keyword-based (i.e. utilizing dictionaries’ synonyms and antonyms type), learning-based (based on the classifiers that earlier trained) and hybrid (mixture of keyword and learning techniques) [44]. Emotion analysis have been implemented in several fields, for instance, novels [45] as well as suicide notes [46], [47]. Authors in [41] determined seven emotions, which are popular in cyberbullying attitudes, and those are empathy, embarrassment, anger, relief, pride, fear, and sadness. To be particular, the researchers discovered accusers give more fright but fewer 15 anger, comparing victims beside reporters who appeared to face more grief and comfort. [48] checked the existence of anger, fright and sadness in cyberbullying, though, no magnificent effect was recorded even though a development on the accuracy in specifying aggressive attitude. 3.1.6 Arabic Language Arabic is a language that is spoken by over 300 million people in the Middle East, North Africa, and the Horn of Africa. It is the official language of many countries, including Saudi Arabia, Qatar, Kuwait, Iraq, Egypt, Lebanon, and the UAE. Arabic is a Semitic language, related to Hebrew and Aramaic. It is written in a form of the old Aramaic alphabet, which was developed in the Middle East over two thousand years ago. Arabic is written from right to left and consists of 28 alphabets, and its syntax is quite different from English. Arabic is also an important language in the Muslim world, as it is the language of the Qur’an, the holy book of Islam. Arabic is also used in many Muslim countries as a second language. The complicated morphology of Arabic linguistics excludes researchers from studying it. There are three types of Arabic language: The first one is the Classical Arabic which is the ancient language and is the language of the Holy Quran, and Muslims utilize it in their prayers. The second type is the Modern Standard Arabic (MSA) that is considered the official language spoken in news broadcasts, formal documents, and education, and is well known by the Arabs. The last type of the Arabic Language is the Dialect accents and are the spoken linguistic among the Arabic people. The dialect accents differ from place to another where the Arabic word consists of around 10 dialects based on their location [49] [50]. The difference in the dialects created differences in meaning of some words, for instance the “Yetqalash” which is expressed in Yemini for a compliment while in Morocco it means an offensive. Due to the Arabic language nature, it is considered challenging, where there is no Capitalization in its words [49]. Arabic morphology causes a lot of confusion where the Arabic corpus is considered rare. Arabic linguistics is ranked the 7th over the universe, and it is growing widely over the social media which motivate the research in the Arabic linguistic domain. A comprehensive search was conducted on convenient publications and articles and rare previous publications was performed for cyberbullying detection using Arabic language textual except few papers in the area of machine learning and Natural Language Processing (NLP). The earlier research accomplished in Arabic applied text preprocessing or 16 classification of the text. In the [51], the researcher presented a key phrase extraction technique to join multiple key phrase extraction techniques with ML methods. The outcomes from the key phrase extraction techniques are utilized as features to the ML techniques while the ML method classifies these features as even a key phrase or not. They contrast their results by utilizing three ML algorithms: Linear Discriminant Analysis and SVM [52], and Linear Logistic Regression [53]. Their results proved that SVM presents the optimum results in key phrase extraction amongst the others. Much research had been performed in Arabic known entity extraction, for example Named Entity Recognition for Arabic (NERA) [54], to clarify the appropriate names in Arabic text and documents. NERA utilizes a list of named entities and corpus compiled from a variety of resources; the evaluation was done via recall, precision and F1. The outputs showed satisfaction results; 86.3%, 89.2% and 87.7% for person named entities. Spam emails filtration using Arabic and English language was performed in [55]. They performed the proposed technique on pure Arabic, pure English and a combination of Arabic and English emails. Many ML methods were utilized, consisting K-Nearest Neighbor (K-NN), NB, SVM [56] and Neural Network. The performance of the proposed technique was evaluated across all the previously mentioned methods and SVM showed the premium results in the pure English among the others. The system performance in the Arabic emails was not satisfactory compared to the English, because of the complicated nature of the Arabic language. The research also confirmed that stemming Arabic wording improved the efficiency of the classifiers, while NB proved the best performance with 96.78% Recall and with F1 of 92.42% measuring. In addition, there are some efforts for detecting spam emails in social media, for instance twitter. Researchers in [57] enhanced a system to detect spam messages in Arabic tweets. The proposed system performed remarkable results based on the accuracy, precision and recall measurements. Sentiment analysis is considered one of the text classification groups. Sentiment Analysis sorts the textual into positive, negative or natural [58]. Sentiment analysis was founded by [59] using Facebook comments in Arabic. They constructed a corpus from 6000 comments extracted from Facebook, preprocessing the data, then implemented classifications to specify the sentiment type. Three types of classifiers were implemented: NB, Decision Trees and SVM. The results proved the effectiveness of SVM compared with the other methods with accuracy 73.4 %. Another research was conducted for sentiment analysis for Arabic tweets done by [60]. Their specific contribution was done on dialects and Arabizi. They combined K-NN, SVM and NB to classify the comments, and that NB 17 provided the best results. In [61], the researchers utilized the dialectic Arabic text for detecting the sentiment. For the detection, they implemented two methods: translating the dialectical terms into MSA, then the detection is performed by MSA lexicon. The second way was via dialectical lexicon detection. SVM besides NB were used as classifiers to detect the polarities either positive or negative. The obtained results proposed enhancement in Precision, Recall and F-measure for translating into MSA over the other method. In [58], the sentiment analysis was performed on Arabizi as well. The proposed system firstly converted the Arabizi text into Arabic via utilizing their technique. They used the crowdsourcing tool to label the data then implement SVM and NB as classifiers. A comparison was conducted between SVM and NB, and the results proved the outperformed SVM. In Saudi Arabia, the Arabic tweets were analyzed by [62]. They utilized a hybrid method for analyzing tweets. The hybrid technique consisted of constructing a classifier besides training it via a one-word dictionary. The results were compared with the supervised learning method and the results proved the superiority of the hybrid technique. An impressive research was conducted on stemming for Arabic linguistics. Stemming is a process of decreasing the word to its root [63]. Various Arabic stemmers are obtainable, consisting rule-based stemmers, for example Khoja [64] besides light stemmers where Light stemmers eliminate letters from words – suffixes and affixes- without knowing the words’ roots. 3.1.7 Arabic Text Cyberbullying Types Cyberbullying is a phenomenon that is becoming harsher worldwide, and regrettably, it is also a concern in Arabic-speaking nations. The Arabic language is expressed as a delicate and complicated language. Arabic slang tweets were labeled by appending “bullying attitude” which classified based on the following cyberbullying forms [15]: 1) Sexual Cyberbullying: this form contains the embarrassing words to the persons with sexual offense for instance as “Slut” "فاسقة". 2) Physical Sexual Cyberbullying: this type of cyberbullying consists of particularly the body's private sexual portions. 3) Religious cyberbullying: this form consists of the expressions that aim at the religious and doctrine of the victims, such as: "خاين" which means “traitor”. 4) Political Cyberbullying: this type includes the bullying expressions that target the behavior in politics, for instance: "ذنب الحكومة" which means “the government tail”. 18 5) Looking cyberbullying: this type represents the words that depicts the victims looking, such as "أنِت بشعة" which means “you are ugly”. 6) Animal Phrases Cyberbullying: this type is related to the description of the people with the type of animals, for instance "بومة" which means “owl”. 7) Racism Cyberbullying: this cyberbullying is pointed to the racialism or ethnic nationality of the persons, for example "عبد أسود" which means “nigger”. 8) Sectarian Cyberbullying: the phrases of this cyberbullying arise between two groups who have political or cultural conflict, such as "شيعي" which means “Hateful Shiite". 9) Cross-Cultural Cyberbullying: this type of cyberbullying deals with the abusers' endeavor to offend the victim’s culture [67], for example "شحاد"which means " Beggar" . 10) Psychological cyberbullying: is the bullying associated with psychology, for example .”means “shame on you "انت واحد ما بتستحي" 11) Praying Negatively for Person: for instance "هللا ياخد روحك'' which means “god take your soul”. 12) General Cyberbullying: this cyberbullying does not belong to the previously mentioned forms, such as "تفه عليك" which means “spit on you”. 13) Non cyberbullying: the text does not consist of any expressions associated with cyberbullying/ attitude, for example "انا بعرفك" which means "I know you". Table 1 represents the cyberbullying types and examples of each one. This categorization of cyberbullying attitude has been validated by Arabic language experts and Arabic native speakers. 19 Table 1: Cyberbullying Types and examples Examples Cyberbullying Types Arabic English Sexual Cyberbullying فاسقة Slut Physical Sexual Cyberbullying Religious cyberbullying خاين Traitor Political Cyberbullying ذنب الحكومة Government Tail Looking Cyberbullying أنِت بشعة You are Ugly Animal Phrases Cyberbullying بومة Owl Racism Cyberbullying عبد أسود Nigger Sectarian Cyberbullying شيعي حاقد Hateful Shiite Cross-Cultural Cyberbullying شحاد Beggar Psychological Cyberbullying انت واحد ما بتستحي, غبي stupid, Shame on you Praying Negatively for Person هللا ياخد روحك God take your soul General Cyberbullying تفه عليك Spit on you Non Cyberbullying انا بعرفك I know you 3.2 Deep Learning Approaches 3.2.1 Convolutional neural networks Convolutional neural networks (CNNs) are one of the effective deep learning techniques used for computer vision and image recognition applications. CNNs are built to automatically recognize patterns and characteristics inside images and are inspired by the way the brain interprets visual information. CNN uses many layers to analyze the data, doing away with the necessity for human extraction. CNNs are a potent NLP technique that produced excellent results. Tasks requiring computer vision frequently employ CNNs. It may be used to several pictures, documents, audio files, and films. The CNN is frequently used in hierarchical data categorization to identify patterns. CNNs are indeed a mathematical expression that stimulates the neural networks in the brain. Others can uilize three straightforward levels, such as input, output, and hidden layer, while some just use the two main layers—input and output layers. 2019 (Ibrahim et 20 al.). The previous several years have seen a lot of NLP and sentiment analysis research, with promising outcomes. The CNN and LSTM algorithms clearly beat the single model. (SVM, NB and ME). In order to represent the grammar as a collection of words, CNN employed by utilizing a single layer of convolution. CNN Convolutional Layer The convolutional layer, which applies a group of teachable filters or kernels to the input picture, is the fundamental component of a CNN. In order to create a feature map, each filter performs a dot product operation as it moves over the input picture. The feature map displays where on the input picture that filter was activated. In order to create a 3x3 feature map, a 3x3 filter is applied to a 5x5 input picture in figure . To determine the value of each pixel in the feature map, the filter traverses through the input picture, executing a dot product at each place. Figure 2: CNN Convolutional Layer CNN Pooling Layer The rectified linear unit (ReLU), which brings non-linearity into the network, is one non-linear activation function that is then applied to the output of the convolutional layer. A pooling layer follows, which lowers the feature maps' dimensionality and increases the network's computational efficiency. The graphic in figure 3 illustrates how a 4x4 input feature map is subjected to a 2x2 max pooling operation to produce an output feature map. The spatial dimensions of the 21 output are decreased by the pooling procedure, which chooses the largest value in each 2x2 window of the input feature map. Figure 3: CNN Pooling Layer Following that, further convolutional and pooling layers are applied to the pooled feature maps, each of which learns increasingly intricate patterns and information from the layer before it. Figure 4: CNN Architecture Convolutional Neural Network Architecture: One or more fully connected layers then execute classification or regression tasks based on the learnt features after flattening the output of the final convolutional layer into a 1D vector. The architecture of a typical CNN is depicted in figure picture, which includes a number of convolutional and pooling layers followed by one or more fully connected layers for classification or regression. 22 Visualizing Feature Maps: The picture in demonstrates how to view the feature maps that a CNN has learnt in order to recognize the patterns and features that the network has discovered. In the first convolutional layer, each row represents a separate filter, and the columns display the activations of that filter across various input pictures. Figure 5: Visualizing in CNN 3.2.2 Recurrent Neural Networks (RNN) A particular class of neural network known as recurrent neural networks (RNNs) is designed specifically for processing sequential data, including time series, audio signals, and spoken language. RNNs are intended to keep a memory of prior inputs and utilize that knowledge to forecast what will be inputted in the future, unlike feedforward neural networks that analyze each input individually. A Recurrent Neural Network's (RNN) key characteristic is the presence of at least one feed-back link, which allows activations to cycle around in a loop. This gives the networks the ability to process time and learn sequences, for example, to recognize and reproduce sequences or to anticipate temporal relationships between events. The architectures of recurrent neural networks can take many distinct shapes. One typical version has an additional loop and a conventional Multi-Layer Perceptron (MLP). They have some type of memory and can make use of the MLP's potent non-linear mapping capabilities. Others may have more uniform architectures with every neuron connecting to every other neuron and/or stochastically activated neurons. Similar gradient descent techniques to those 23 leading to the back-propagation method for feed-forward networks may be used to train for basic structures and deterministic activation functions. In sequential tasks like processing voice and natural language, the current input data is constantly dependent on the applied inputs from the past. Finding the connection between the current input and the previously applied inputs is the job of RNNs. RNNs can theoretically employ information sequences of limitless length, but in reality they can only go back a short distance. In the illustration above, an RNN is unfurled into a complete network. Just repeating the same network layer structure throughout the whole sequence is what we mean when we say we are unfolding. The input at time step t is Xt. The vector Xt can be any size N. At time step t, the concealed state is A. It is the network's "memory." Based on the input at the current step and the prior concealed state, it is computed. As shown by At= f (W Xt + U At-1) Here, the input and prior state value input weights are denoted by W and U. Moreover, f is the nonlinearity that is applied to the sum to produce the final cell state. One of the allures of RNNs is the potential for them to make connections between prior knowledge and the work at hand, for example, by using prior video frames to help interpret the present frame. RNNs would be very helpful if they could accomplish this. Sometimes, all we need to do to complete the work at hand is to glance at the most current data. Consider a language model that uses the words before it to attempt to predict the word that will come next. There is no need for more context if we are attempting to determine the final word in "the clouds are in the sky" because it is very clear that sky will come after it. RNNs can learn to utilize the prior information in such circumstances, when there is a close proximity between the pertinent information and the location where it is required. Theoretically, RNNs can handle such "long-term dependencies" with ease. To solve these kind of toy issues, a person might carefully choose the settings for them. Regrettably, RNNs don't seem to be able to learn them in practice. 24 3.2.3 Long Short Term Memory Networks (LSTM) They are a particular sort of RNN, commonly known as LSTMs, and are able to learn long-term dependencies. They are currently frequently utilized and perform incredibly well when applied to a wide range of issues. Intentionally, LSTMs are created to prevent the long- term reliance issue. They effectively function in a way that makes long-term memory retention their default mode. All recurrent neural networks have the shape of a series of neural network modules that repeat. This recurring module in typical RNNs will be made up of just one tanh layer, for example. The weight matrix associated with the connections between the neurons of the recurrent hidden layer can end up multiplying the gradient signal by a significant number of times (as many as the number of time steps) during the gradient back-propagation phase in all traditional recurrent neural networks. This indicates that the learning process may be significantly impacted by the size of the weights in the transition matrix. The gradient signal may become so small in the case of stochastic gradient descent, when learning either becomes extremely slow or ceases to function, if the weights in this matrix are small (or, more precisely, if the leading eigenvalue of the weight matrix is lower than 1.0). Learning long-term dependencies in the data may become more challenging as a result. On the other hand, if the weights in this matrix are enormous (or, to put it more precisely, if the leading eigenvalue of the weight matrix is larger than 1.0), it may result in a scenario where the gradient signal is so significant that learning may diverge. Exploding gradients is a common name for this. These difficulties serve as the primary driving force for the LSTM model, which adds a brand-new component called a memory cell. An input gate, a neuron with a self-recurrent connection (a link to itself), a forget gate, and an output gate make up the four essential components of a memory cell. With a weight of 1, the self-recurrent connection guarantees that, without external interference, the state of a memory cell can remain constant from one- Figure 7: Basic architecture of Recurrent Neural Networks time step to the next. The gates are used to control how the memory cell interacts with its 25 surroundings. An incoming signal can be blocked by the input gate or allowed to change the state of the memory cell. The output gate, on the other hand, can either permit or disallow the state of the memory cell from having an impact on other neurons. The memory cell may recall or forget its prior state as needed by modulating the self-recurrent connection of the memory cell through the forget gate as shown in figure 8. Figure 8: LSTM Single Cell The input gate I forget gate (f), and output gate (o) are the three gates in an LSTM. [g] is the input that has to be updated in the cell at this time step. According to the recurrent nets concept, each of these gates and the update signal depend on the cell's current input as well as the prior hidden state. 3.2.4 Hybrid CNN-RNN Convolutional and recurrent neural networks are combined in CNN-RNN, a hybrid neural network design, to analyze sequential data such as time series and text written in natural language. In many different applications, including speech recognition, picture captioning, and video analysis, the CNN-RNN architecture has been extensively used because it can detect both local and global patterns in sequential data. The CNN-RNN architecture is made up of two parts: a CNN component and an RNN component. Local patterns from the input sequence are extracted by the CNN component, while the long-term relationships between the patterns are captured by the RNN component. The CNN component is used to apply a series of convolutional filters to the input sequence in order to extract local patterns. Each feature map in the output of the CNN component captures a distinct local pattern from the input sequence. 26 The RNN component analyses the series of feature maps and captures the long-term relationships between the local patterns after receiving the feature maps. A basic RNN, a gated RNN such an LSTM or GRU, or a bidirectional RNN can all be used as the RNN component. An example of a more sophisticated CNN-RNN architecture is shown in figure 10. In this illustration, a group of photos serves as the input sequence, and the CNN component extracts regional patterns from each image. A series of feature maps, each of which reflects a distinct local pattern in the picture, is the output of the CNN component. CNNs have been applied to text classification jobs as well as picture recognition problems. Sliding window techniques are used by CNNs to extract local characteristics from the input, enabling them to recognize significant patterns and features at various sizes. Nevertheless, CNNs are less successful at tasks that call for comprehension of the context Figure 9:CNN-LSTM Architecture of the full text because they do not capture long-range relationships between words. 3.2.5 A comparison between deep learning models In the category of machine learning algorithms known as "deep learning," complex patterns and relationships are learned from big datasets. We'll contrast a few of the most well-liked deep learning algorithms in this article and point out their advantages and disadvantages. 1- Convolutional Neural Networks (CNNs): CNNs are typically employed in jobs involving image recognition and video analysis. Convolutional layers are used by CNNs to learn spatial patterns and extract characteristics from pictures. CNNs 27 can manage changes in object size, shape, and orientation and are noise-resistant. To learn complicated characteristics, CNNs may need a lot of data, and the training procedure might be computationally expensive. 2- Recurrent neural networks (RNNs): RNNs are applied to sequential data in time series analysis, speech recognition, and natural language processing. RNNs contain a feedback mechanism that enables them to process each input in the sequence using their internal state. RNNs can handle variable-length sequences and are effective at capturing long-term relationships in the data. Yet, the vanishing gradient problem that RNNs experience might make it challenging to understand long-term dependencies. 3- Long Short-Term Memory Networks (LSTMs): LSTMs are a particular RNN type that can solve the vanishing gradient issue. In LSTMs, information is controlled by gating mechanisms while memory cells are used to retain data for extended periods of time. With its ability to handle variable-length sequences, LSTMs excel in modeling sequences with long-term dependencies. Nevertheless, learning complicated patterns with LSTMs can be computationally costly and necessitate a lot of data. 4- GANs (Generative Adversarial Networks) are utilized for generative tasks including text creation and picture synthesis. Two neural networks make up GANs: a generator network that generates new data and a discriminator network that aims to separate created data from actual data. GANs are capable of producing high-quality samples and can pick up on intricate distributions. GANs, however, can be challenging to train and may experience mode collapse, in which the generator only generates a narrow range of samples. 5- Transformers: The neural network design known as a transformer is generally employed for jobs involving natural language processing. Transformers recognize long-range relationships between words in a phrase via self-attention techniques. Modern performance on a variety of natural language processing tasks, including sentiment analysis, question answering, and machine translation, has been attained using Transformers. Transformers, however, may be computationally costly and need a substantial quantity of training data. 28 3.3 Data Pre-Processing In both NLP and SA, the pre-processing stage is important. It assures higher performance in sentiment analysis and helps to enhance the state of the dataset that was gathered. The standard pre-processing techniques include text cleaning, normalization, tokenization, stop word removal, and stemming. The prepressing procedure can be done in a number of steps, depending on the study purpose and the language used. The effectiveness of Arabic sentiment categorization employing a unique pre-processing step was assessed in researches. Using a customized stop list and employing emoticons are part of the procedures. The results show that by replacing emoticons with customized stop lists and content phrases, Arabic attitude classification may be made more efficient. A crucial step in enhancing data quality is data cleansing. The task of detecting data polarity is improved. Following data collection, the cleaning stage eliminates misspellings and slang terms. Given the variety of dialects and Arabic language varieties, it is difficult to use this in Arabic text. When it comes to the linguistic domain for NLP, Arabic is a difficult language to work with. It requires sophisticated pre-processing because to its morphological challenges and range of dialects. Delete usernames, hash tags, URLs, punctuation, and extra whitespace as part of a beneficial social media corpus cleaning procedure. Others may eliminate lengthy words, special characters, numerals, and English text depending on the assignment. It is very important to pre-process and assess Arabic unpolished contents. 3.3.1 Data Cleaning In this part, several data preprocessing steps and cleaning were conducted on the collected tweets or data, to reduce the Arabic language complexity as well as to improve the results quality. The data preprocessing stage was applied by Python. Fig. 11 and the following steps represent the preprocessing and cleaning data structure: 1) Data Collecting 2) Data Uploading and Reading: the collected dataset was loaded to perform the preprocessing steps using the IO standard Python library and was read using a data frame. 3) Noise Elimination: ● Eliminating all non-Arabic tweets and RT (Re-Tweet). ● Remove Arabic and English numbers like (1, 2) and punctuations like ( ،, ؟). ● Delete URLs, mention user by @, emotional stickers, hyperlinks and hashtags. ● Removing stop words like (ما , عشان) and Latin characteristics. 29 ● Eliminating repeated letters like (حلووووو) and repeated words () . ● Remove Diacritics and Tatweel. 4) Normalization: ● Substituting the character (ة) with (ه). ● Replacing the character (ى) with (ي). ● Substituting the character (إ, أ, آ) with (ا). 5) Stop words removal: is the method of eliminating words from sentences that only serve to structure language and add little to the primary content. These words include "a" and "are," "the" and "was," for instance. The work of sentiment analysis or categorization is not made easier by these terms, which are typically prevalent. Due to the fact that no essential information is lost, this aids in decreasing cuprous size. 5) Tokenization: separates the raw text string using whitespaces into a list of distinct words, is a crucial step for the majority of NLP jobs. A sentence or document is divided into tokens, which are words or phrases. In languages like English, Arabic, and French, where spaces are used to separate words, it is a straightforward process. In languages like Japanese, Chinese, and Thai, where words are not separated, it is more difficult. Arabic is a language where tokenization may be applied since most words are separated by spaces. 6) Delete instances that are irrelevant: Look for instances that are not pertinent to the current work and eliminate them. Remove instances that don't communicate emotion, for instance, if the aim is to classify sentiment. Data Preprocessing and Cleaning Data Noise Normalization Uploading Elimination and Reading Stop words Tokenization removal Figure 10:Data Preprocessing and Cleaning 30 3.3.2 Vectorization and Word Embedding Deep neural networks frequently employ vectorization and word embedding for natural language processing (NLP) applications including sentiment analysis, machine translation, and text categorization. Text data must be vectorized in order to be transformed into a numerical representation for deep neural network input. The bag-of-words (BoW) model, which represents a document as a vector of word counts, is a popular method of vectorization. Another method is the term frequency-inverse document frequency (TF-IDF) model, which gives each word a weight depending on how frequently it appears in the text and how frequently it appears within the corpus as a whole. Yet, the BoW and TF-IDF models both disregard the meaning and context of the words used in a document. Word embedding has a role in this. Using a method called word embedding, each word is converted into a dense vector in a high-dimensional space, where words that are similar to one another are clustered together. Word2vec is one of the most popular word embedding methods used in NLP. A neural network technique called Word2vec learns from word spread representations to produce accurate representations without the requirement for labels. When given enough training data, the network generates word vectors with fascinating characteristics. Continuous Bag-of-Words Model (CBOW) and Continuous Skip-gram Model are the two kinds of Word2vec. In essence, CBOW predicts the word based on the provided text, whereas Skip-gram guesses the text based on the provided word. Using the Word2Vec method, Rehman et al. trained main word embeddings in 2019. The Word2Vc computes the distance between words and groups together similar words after converting the text into a vector of digital values in accordance with the meanings of the terms. The model includes a set of features that were extracted using convolution and global max-pooling layers in addition to long-term dependencies. In order to improve accuracy, the provided model additionally employs dropout function, normalization, and a corrected linear unit. In a different study, constructed Word2Vec models using a corpus obtained from several Arabic-language publications. They used CNN with variation text feature extractions and machine learning techniques on this dataset. The outcome indicated increased accuracy for the sentiment categorization job. An Auto Arabic Lexicon was developed using the best Word2Vec model and used with various ML techniques. 31 The goal of the Bag of Words (BoW) model is to convert the provided text into a more straightforward numerical representation known as a vector. In order for the computer to interpret the sentence, it is then expressed as a string of numbers. Each vector component also indicates whether a certain word is present or absent. In order to show how frequently a term appears, frequency counts are also used. The model replaces this with a binary representation, either a 1 or a 0. A vector with the same size as the vocabulary is the outcome of BOW. Word embedding, on the other hand, has a vector size that is significantly less than the vocabulary size. Word embedding thereby addresses (BOW) model's flaws. Similar qualities serve as representations for the parallel texts. Sentiment analysis and text categorization are only two NLP applications that make use of word embedding. The effectiveness and accuracy of sentiment analysis were enhanced in numerous experiments by the use of word embedding. BOW still has limitations even though it is frequently used to evaluate emotion. For instance, inconsistent wording might cause numerous publications to portray the same thing since they use the same phrases. BOW also ignores the semantics of words. Arabic tweets on violent incidents were gathered and categorized into two categories of sentiments. Their approach is based on Google's Word2vec scheme, which highlights the meaning of words through the use of a deep learning-inspired technique. Weighted average and Word2vec were used to display tweets. Sentiment was also measured using SVM and Random Forest machine learning methods. We have carried out some research to validate our approaches using the cross-validation methodology. Deep neural networks may be better able to comprehend the meaning of the text if word embeddings are used to capture the semantic links between words. The terms "good" and "great," for instance, would be situated close to one another in the embedding space in a sentiment analysis. 3.3.3 Continuous Bag-of-Words Model (CBOW) In natural language processing, the Continuous Bag-of-Words (CBOW) model of neural networks is frequently employed for word embedding (NLP). The Word2Vec technique, which uses CBOW as a variation, is made to learn vector representations of words based on their co-occurrence in a sizable corpus of text data. 32 The CBOW model is a shallow neural network that predicts the target word in the middle of a fixed-length window of context words. An input layer, a hidden layer, and an output layer make up the model. The context words are represented as one-hot vectors in the input layer. The input vectors are first transformed linearly by the hidden layer before being activated by a nonlinear function, such as the hyperbolic tangent (tanh) function. A softmax operation is used in the output layer to create a probability distribution across the vocabulary, where each element reflects the likelihood that the target word will really be that specific word. To reduce the cross-entropy loss between the predicted probability distribution and the real probability distribution, which is represented as a one-hot vector for the target word, the hidden layer's weights are modified during training using stochastic gradient descent (SGD). Although they are updated, the input layer's weights are not used for prediction. The weights of the hidden layer can be utilized as word embeddings for further NLP tasks, such as sentiment analysis, machine translation, and text classification, after the CBOW model has been trained. The word embeddings, which are based on the co- occurrence of terms in the training corpus, capture the semantic links between words. The CBOW model has the benefit of being computationally more effective than alternative neural network architectures for word embedding, such the Skip-gram model. Word embeddings for frequently occurring words that appear in the training corpus can be learned using the CBOW model. 3.3.4 Skip-Gram (SG) model In natural language processing, the Skip-Gram (SG) model is a typical neural network design for word embedding (NLP). The Word2Vec approach, which the SG model is a variation of, is made to learn vector representations of words based on their co- occurrence in a sizable corpus of text data. The SG model is a shallow neural network that predicts a fixed-length window of context words that are present around the target word given as input. An input layer, a hidden layer, and an output layer make up the model. The target word, which is encoded as a one- hot vector, is represented in the input layer. The input vector is transformed linearly by the hidden layer before being activated by a nonlinear function, such as the hyperbolic tangent (tanh) function. One neuron is present in the output layer for each context word in the 33 window. With the use of a softmax operation, each neuron creates a probability distribution across the vocabulary, where each element denotes the likelihood that the context word will contain that specific word. To increase the likelihood of correctly predicting the context words given the target word, stochastic gradient descent (SGD) is used to adjust the hidden layer's weights during training. Although they are updated, the output layer's weights are not used for prediction. The hidden layer's weights can be utilized as word embeddings for further NLP tasks, such sentiment analysis, machine translation, and text classification, after the SG model has been trained. The word embeddings, which are based on the co-occurrence of terms in the training corpus, capture the semantic links between words. The SG model has the benefit of being more successful than the Continuous Bag-of- Words (CBOW) model at learning word embeddings for uncommon words that only sometimes appear in the training corpus. The SG model is also good at capturing the syntactic and semantic links between words, such as verb-object and adjective-noun pairings. 3.3.5 Feature Extraction Tf-idf and word embeddings are the two primary straightforward and issue- independent feature extraction approaches that we used. The term frequency-inverse document frequency (tf-idf term weight) measures how pertinent a word is to each document in a collection of documents. To compare against utilizing word embeddings as features, we solely utilize tf-idf weights with traditional machine learning methods in our tests. The most common distributed representation of words is word embeddings, which brings us to our second point (or terms). Every word in the vocabulary is represented as a vector with a few hundred dimensions, where words with the same meaning are near together and words with different meanings are far away. Using the situations in which the words are used, one may learn the vector representation of the words. Word2Vec is one of the most widely used methods for effectively learning a solitary word embedding from a text corpus (Mikolov et al., 2013). Skip Gram and Continuous Bag of Words are two alternative instructional approaches for learning the embeddings (CBOW). The continuous skip-gram takes the current word as input and learns the embeddings by predicting the surrounding words, whereas the CBOW model takes the current word as input and learns the embeddings by predicting the surrounding words (Mikolov et al., 2013). We employed the pre-trained Arabic word embedding model AraVec2.0 (Soliman et al., 34 2017) in our experiments. This model offers a variety of pre-trained Arabic word embedding model architectures, each of which is trained on one of three distinct datasets: tweets, web pages, and Wikipedia Arabic articles. Moreover, two models are created for every dataset, one using Skip Gram and the other using CBOW. As we focus on tweets for our research, we employed the pre-trained Skip-Gram 300D-embeddings trained on more than 77M tweets. The pre-trained model was used to both traditional and neural learning techniques. The average vector of all the tweet word embeddings is calculated and utilized as the feature vector of the tweet in order to use it with traditional learning techniques. In contrast, the embedding vectors are employed in neural learning models to set the weights of the embedding layer, which is subsequently linked to the other layers in the network. 3.3.6 Annotation It is required to annotate the data in order to provide labels for the network to learn from while training a Convolutional Neural Network (CNN) for Arabic text analysis. Annotation is the process of labeling or marking up data to produce a ground truth for machine learning models that can be utilized for training and evaluation. When annotating Arabic literature, certain elements of the text, such as words, sentences, or sentiment, are identified and labeled. Creating the categories or labels that will be used to categorize the text is the first stage in the annotation process. Sentiment (positive, negative, or neutral), topic (politics, sports, or entertainment), or entity are some examples of these categories (person, organization, location). The annotator can start labeling the text in accordance with the categories after they have been established. Several techniques can be used to annotate Arabic text, depending on the work at hand and the resources at hand. Using human annotators to manually label the data is one such approach. Although this method might be time-consuming and costly, it guarantees excellent annotations and permits the incorporation of subtle descriptors. Using automated annotation tools like named entity or part-of-speech taggers is another option. To automatically label the text, these solutions make use of pre-existing models and rules, which can save time and effort. Yet the quality of the annotations could be inferior to that of those created by human annotators, and the tools might not be able to handle all kinds of annotation assignments. 35 The annotated data may then be utilized to train a CNN for Arabic text analysis. In order for the CNN to anticipate what will happen with fresh, unlabeled material, it must learn to identify patterns and characteristics in the text that are connected to the classified categories. It is a popular machine learning approach to carry out this procedure, which is referred to as supervised learning. Since there is no consensus on what constitutes cyber bullying speech, identifying it can be difficult. The classification of a statement as hate speech is a subjective process based on individual opinion. The following principles were adopted to help human annotators recognize cyberbullying in order to lessen this subjectivity [17,18]. Hence, a post is deemed bullying if it exhibits one or more of the following traits: 1. Using offensive adjectives, terms, or slurs to insult or defame a particular group. 2. Justifying or defending bullying crimes. 3. Supporting and fostering hatred. 4. Promoting one group's superiority over another. 5. Making threats and calling for violence. 6. Disparaging and negative stereotypes. 7. Use of irony and jokes to make fun of and humiliate the target due to a protected attribute. 8. Unique situations: Self-attacking, in which the speaker uses derogatory language to attack a protected attribute of himself. 3.3.7 Training Data and Test Data A crucial machine learning and data science approach is the division of a dataset into training and test data. With this method, a portion of the data is used to train a model and a different portion is used to assess the model's performance. This approach aims to prevent overfitting, which happens when a model grows overly complicated and performs admirably on training data but badly on test data. Overfitting occurs when a model learns to fit the noise in the training data instead of the underlying patterns, which makes it difficult for the model to generalize to new data. 36 Choosing how much of the dataset will be utilized for training and how much for testing is the first stage in dividing the dataset. Generally, training uses 70–80% of the data, whereas testing uses the remaining 20–30%. To guarantee that the classes are equally represented in the training and test data, this split can be carried out at random or with the use of a stratified sampling strategy. Following the data's division, the model is trained using the training set and assessed using the test set. Depending on the nature of the issue and the model's goal, several evaluation metrics are employed to evaluate the model's performance. For instance, standard assessment measures for classification problems include accuracy, precision, recall, and F1 score. It's vital to remember that the performance of the model on test data rather than training data is a stronger measure of its capacity to generalize to new data. It is crucial to adjust the model's hyperparameters depending on the performance of the test data rather than the performance of the training data. Creating a deep learning model requires separating a dataset into training and test data. It aids in avoiding overfitting, evaluating the model's effectiveness, and fine-tuning its hyperparameters for improved generalization. 3.3.8 Metrics Evaluation Accuracy, precision, recall, and F1 score are metrics used to evaluate the performance of a classification model. They are particularly useful in assessing the performance of models that aim to predict binary outcomes, such as spam/not spam, fraud/not fraud, or positive/negative. Accuracy is the simplest and most commonly used metric for classification models. It measures the percentage of correctly predicted instances among all the instances in the dataset. In other words, accuracy tells us how well the model can classify instances into their respective categories. It is calculated as: Accuracy = (true positives + true negatives) / (true positives + false positives + true negatives + false negatives) Precision is a measure of the proportion of true positives (correctly predicted positive instances) among all the instances that the model predicted as positive. In other 37 words, it tells us how precise the model is in predicting positive instances. Precision is calculated as: Precision = true positives / (true positives + false positives) Recall is a measure of the proportion of true positives among all the actual positive instances in the dataset. In other words, it tells us how well the model can identify positive instances. Recall is calculated as: Recall = true positives / (true positives + false negatives) F1 score is a measure of the harmonic mean of precision and recall. It is a balance between precision and recall and is useful in cases where we want to consider both metrics equally important. The F1 score is calculated as: F1 score = 2 * (precision * recall) / (precision + recall) A high accuracy score does not necessarily mean that the model is performing well, especially when dealing with imbalanced datasets where the distribution of classes is uneven. For instance, in a dataset where the majority of instances belong to one class, a model that always predicts that class will have high accuracy but poor precision and recall. Therefore, it is essential to consider precision, recall, and F1 score alongside accuracy when evaluating the performance of a classification model. 38 4 CHAPTER FOUR: DATASET CORPUS By the use of mobile applications, the internet, and social media portals, there has been a tremendous increase of data in recent years. People are now able to share their opinions about certain topics due to the rapid development of technologies and social media platforms. People all around the world use a lot of these social media sites to express their opinions. By evaluating prior evaluations, it has become simpler to identify cyberbullying thanks to technologies like machine learning (ML) and deep learning (DL). Also, the importance of movie reviews in determining a film's rating and financial success has grown. Also, movie producers might use these critiques to tweak their productions in response to critics' assessments. Humans cannot, however, analyze all of the data to determine if a person's opinion is favorable or unfavorable due to the vast amount of reviews and information gathered. To improve the production and analysis of the sentiment analysis, we must rely on emerging ML technologies. 4.1 Data Collection A detailed description of the suggested data gathering methods is discussed in this section. The three Arabic language varieties MSA and Dialect Arabic (DA) are all present in combination in Arabic social media sites. Even if there are some commonalities across the three categories, there are also some significant distinctions that might lead to subpar application of SA. The Python programming language and its libraries are among the most flexible and well-liked techniques used in data analytics, notably for machine learning. The manual technique, which relies on human labor, and the automatic method, which uses an annotation tool, are only two of the various ways to annotate corpora. Figure 12 shows example about the raw data crawled from API Tweeter. Figure 11: Raw data crawled from Tweeter 39 4.2 Two-Class Classification The First dataset which represents a two-class classifications was produced by collecting the Twitter accounts utilizing API Tweeter to find tweets that discussed cyberbullying and not cyberbullying. Around 13,000 tweets were collected between Oct 2022 and March 2023; a characteristics of this two-class classification is displayed in Table 2 below. The content of the tweets was extracted in EXEL file then transferred to CSV file after prepossessing the tweets because this format is compatible with a variety of text processing tools. While table 3 presents an example of the collected data for the first dataset that labeled with Cyberbullying (CB) and Not Cyberbullying (NOT-CB) . Table 2 : Characteristics Of The Two-Class Classification. No. of tweets 13481 Start Date Oct 2022 End Date March 2023 Table 3: An Example Of The Collected Data For The First Dataset Tweet Translate in English Category شفنا بس انت اغبى من الغباء We saw, but you are dumber Cyberbullying than stupidity itself نفسه Live life happily and don't feel Not Cyberbullying عيش الحياة بسعادة وال تنكد على نفسك bad about yourself 4.3 Six-Class Classification The second dataset, which represents the six-class classification, was created by gathering tweets using API Tweeter to look for tweets based on the specified protected features, we further divide the nasty texts discovered at the second level into sex categories as bellow:  Sextual cyberbullying (SCB)  Animal phrase cyberbullying (ACB)  Looking cyberbullying (LCB)  Religious cyberbullying (RCB) 40  Psychological Cyberbullying (PSB)  Not-Cyberbullying (NCB) A total of 15,486 tweets were gathered between September 2022 and February 2023; Table 2 below shows the Characteristics of Six-Class Classification. Table 4: Characteristics of the Six-Class Classification No. of tweets 15486 Start Date Sep 2022 End Date Feb 2023 Table 4 provides instances of real tweets that have been classified as belonging to one of the six cyberbullying categories that previously mentioned. Table 5: An Example Of The Collected Data For The Six-Class Classification Tweet Translate in English Category You are a donkey by nature, but انت حمار بالفطرة بس عندك you have a face and try to prove Animal Phrase Cyberbullying (ACB) اجتهاد وتحاول تثبت انك حمار that you are a donkey متخلف صهيوني يهودي A backward Zionist Jew Religious Cyberbullying (RCB) شفتك يا قزم I saw you, dwarf Looking Cyberbullying (LCB) يا فاسقة انتي واهلك Oh you whore and your family Sextual Cyberbullying (SCB) يا تربية الشوارع يا غبي Oh street breeding you idiot Psychological Cyberbullying (PSCB) A total of about 30,0000 tweets were collected using API Tweeter for this study in both the first dataset and the second dataset to detect the cyberbullying in the Arabic social media. 41 5 CHAPTER FIVE: METHODOLOGY We approach the identification of cyberbullying as a supervised learning challenge. The essential purpose of the presented methodology is the cyberbullying detection on social platforms which decrease the bullying attitude [65]. This proposed technique can be implemented to assist the cyberbullying fighting and detect the offensive written texts in social media services. Furthermore, the cyberbullying detection on social media may be considered as an effective way to assist and protect the cyberbullying victims [66]. In this work, we tested a number of neural learning models that were trained to find Arabic cyberbullying tweets on Twitter. The goal of the ongoing study is to provide a framework for reporting cyberbullying that government officials may utilize to make strategic choices. The following are the purposes of the framework's five main modules: 1) Automatic tweet extraction 2) Data cleaning and pre-processing 3) Identification of cyberbullying categories. 4) Collecting of the tweets posted by the identified cyberbullying regarding a particular event 5) Cyberbullying analysis and reporting (See Fig. 13). This section illustrates the study methodology that would be implemented to classify the cyberbullying with the proper cyberbullying class. The stages of the presented methodology structure is clarified in Fig. 1, and debated in the following sections. The methodology of this study is displayed in four stages. In the first stage, the data is collected by utilizing Twitter API and annotated by utilizing the python. In the second stage, the collected data and the annotated data were passed across a data preprocessing and data cleaning to eliminate undesirable symbols and tokens. In the third stage, three deep learning models are implemented to train the data which are a CNN model, a LSTM model and RNN model, while in the final stage is the evaluation which is applied on the models to examine the effectiveness of the proposed techniques. 42 Automatic Data cleaning and Identification of tweet extraction pre-processing cyberbullying categories Classification Evaluation model Figure 12: Methodology Structure 5.1 Data Preprocessing Methodology There are surprisingly few datasets available for the study of online social media forums, presumably as a result of the challenges in developing such datasets under the stringent constraints imposed by social media networks. So, using data collected in two stages, the current research developed two custom databases. In order to identify cyberbullying in each designated issue domain, tweets addressing a variety of specified themes were gathered. 5.1.2 Data Preparation Data must be preprocessed before being used for emotional analysis. Preparing the dataset and normalizing it is the initial step in preprocessing, which enables the classification algorithm to provide quick and accurate results. These four operations, which make up this stage, can be separated. 1- Transformation: URLs, hashtags (#), mentions (@), start and final punctuation, lower- case letters in lieu of upper-case characters, and single spaces in place of multiple spaces are all transformed. 2- Tokenization: Text strings are divided into logical components, such as words or phrases (referred to as "tokens") 3- Filtering: Search engines are set up to overlook stop words (common terms like the, a, an, in, etc.) both during the indexing of entries for searches and the retrieval of entries in response to search queries, so these words are eliminated. 43 4- (Lemmatization) Normalization: This technique, often referred to as "lemmatization," which comes from the word "lemma" or dictionary form, calls for the identification and collection of a single word written in many forms so that it may be studied as a whole. 5.1.1 Collecting Data For data collection for Arabic social media, Twitter is utilized. The only requirement for the data collection is to register the project in the Twitter developer to gain the consumer key, consumer secret, the access token, and access secret token, which will be utilized in the python code. An Application Program Interface (API) provided by Twitter to assist in collecting the tweets. In this research the data collection was done based on key words which represent the Arabic cyberbullying types. The first dataset was 13481 tweets while the second dataset was 15486 crawled tweetes. The collected data extracted as EXCEL file and then transferred them to CSV files. Utilizing particular keywords is highly beneficial and can detect the bullying tweets based on these keywords enabling labeling the data easily. When getting data from Twitter, the Twitter API contains comprehensive information such as user ID, user screen name, Tweet location, time and the date of tweets, and Tweet text (i.e., the main tweet text containing information about emotions, thoughts, behaviors, and other personally salient information). We used this information to develop a set of features to classify data efficiently from Twitter and use them in different applications. All 30,000 tweets collected for this research related to Arabic swearing words; we use these words as searching seeds in the Twitter search engine. 5.1.3 Training dataset and test data In this study, two different kinds of datasets were tested. Prior to using sentiment analysis, we had to train the model and produce a machine-readable version via dataset annotation. The process of adding explanations to a corpus collection for NLP purposes is known as annotation. Several annotation techniques are often used depending on the annotators' objectives. For instance, in this study, the first case is with two annotations for calcification (cyberbullying or not cyberbullying). The second annotation was represented with more classification details about each tweets to specify the type of cyberbullying where the second case consists of six annotations for classification which are (sextual cyberbullying, Religious cyberbullying, Animal phrase cyberbullying, looking cyberbullying, psychological cyberbullying and not cyberbullying). 44 For the Arabic corpus that collected for this study, the data was spitted into training and testing (holdout) with a 70:30 ratio for each case, using the stratified sampling approach, which guarantees an equal class distribution (see Table 6) while table 7 shows the data splitting in both the two cases to training data and test data. Table 6: Training Data and Test Data Two-Class Six-Class Classification Data Classification CB Not-CB SCB ACB RCB LCB PSCB NCB Total data 5929 5278 224 4081 711 969 5380 4126 Table 7: Training Data And Test Data In Two-Class and Six-Class Classifications Data in the Two-Class Data in the Six-Class DATA Classification Classification Training data 10784 12384 Test data 2696 3096 The flowchart in figure 13 presents the related statistics data of the first and second datasets including the training data and test data in the both cases, while figures 14 &15 show the data splitting to training data and test data in the both two cases. Dataset splitting to training data and test data in with the categories for 2-classes classification case is illustrated in the figures 16 & 17 while the dataset splitting to training and test data with classes for the 6-classes classification case is demonstrated in figure 18. Figure 13: 2- Classes Classification and 6-clsses classification Datasets Related Statistics 45 Figure 17: the data set for the Figure 16: the data set for the 6-Class 2-Class Classification Classification Test data Figure 15: Training data categories Figure 14: Test Data categories in 6-Classes for the 6-Class Classification Classification Basic text preparation is carried out as follows to get our dataset ready for feature extraction: Punctuation, non-Roman alphabetic and numeric characters (such as user names and URLs), and diacritical marks (such as tashdid, fatha, tanwin fath, damma, tanwin damm, kasra, tanwin kasr, sukun, and tatwil/kashida) are all eliminated. Repeated characters were also eliminated. For more details refer to section for the data preprocessing. The remaining Arabic text has been made normal. Figure 18: dataset categories for the 6-Class Classification 46 5.2 Feature Extraction The pre-trained Arabic word embedding model AraVec2.0 has been used in the experiments. This model offers a variety of pre-trained Arabic word embedding model architectures, each of which is trained on one of three distinct datasets: tweets, web pages, and Wikipedia Arabic articles. Moreover, two models are created for every dataset, one using Skip Gram and the other using CBOW. As we focus on tweets for our research, we employed the pre-trained Skip Gram 300D-embeddings trained on more than 77M tweets. The pre- trained model was used to both traditional and neural learning techniques. The average vector of all the tweet word embeddings is calculated and utilized as the feature vector of the tweet in order to use it with traditional learning techniques. The embedding vectors, on the other hand, are used to set the weights for the embedding layer in neural learning models before it is connected to the other layers in the network. 5.3 Deep learning Models This study conducted experiments using recurrent neural networks (RNN) and convolutional neural networks (CNN). A variety of RNN designs, including Long Short- Term Memory (LSTM), Bidirectional LSTM (BLSTM), and Gated Recurrent Unit, were tested (GRU). The block diagram of CNN, LSTN and the combined CNN-LSTM are displayed in figures 19, 20, and 21, consequently. 5.4 Evaluation metrics We provide the total F1-macro attained with various features across all classification levels as we are working with unbalanced data in the 2-class, and 6-class datasets. The confusion matrix and receiver operating characteristic curve (ROC) for the top model are also provided in order to display the area under the curve (AUROC) for the binary classification job and the macro-averaged AUROC for the six class classification tasks. Eq. 1, Eq. 2 and Eq. 3 show the evaluation metrics utilized in this study 47 𝑇𝑃 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 (𝑃) = (1) 𝑇𝑃+𝐹𝑃 𝑇𝑃 𝑅𝑒𝑐𝑎𝑙𝑙 (𝑅) = (2) 𝑇𝑃+𝐹𝑁 2∗𝑃∗𝑅 𝐹1 − 𝑠𝑐𝑐𝑜𝑟𝑒 = 𝑃+𝑅 (3) Where x represents a class and n represents the total number of classes. 48 Input Layer Input Layer Input Layer Embedding Embedding Embedding Conv1D Conv1D LSTM Maxpooling1D Maxpooling1D Dropout Time Distributed Flatten Flatten (Dense) LSTM Dense Dense Figure 20:CNN Figure 21: LSTM Dropout Architecture Architecture Flatten Dense Figure 19: CNN-LSTM Architecture 49 6 CHAPTER SIX: EVALUATION AND RESULTS 6.1 Experiment Setup The key elements of our suggested method to identify hate speech in Arabic are presented in this section. The tweets go through a series of preprocessing stages in order to clean up and get the data ready for the training phase (see Section 4). Finally, a number of classification methods are examined for identifying hate speech in Arabic literature. In this study, we assessed three neural network architectures: a CNN-based classifier, a RNN-based classifier, and lastly a classifier that combines CNN and RNN. 6.1.1 Data Preprocessing We initially pre-processed the training and testing datasets using a number of pre- processing techniques before feeding the pre-processed datasets with tweets as input to the classification models. When working with data that is noisy and informal in nature, like that from Twitter, pre-processing is an essential step. Given the innate ambiguity of the Arabic language and the wide range of dialects utilized in the Twitter, this has even greater significance for the Arabic language. Before supplying our data as an input to our tested models, we performed a number of preprocessing processes on it. See section 5.1. 6.1.2 Deep Learning Neural Network Models The architectures of our assessed neural network models (CNN, RNN, and CNN + RNN) are described in this section. All models use Word2Vec [38] to express the features as word embeddings; the feature representation is covered in more detail in the next part. After that, the neural network model (CNN, LSTM, or CNN + LSTM) receives these embeddings. The subsections that follow each architecture model's description. 6.1.2.1 Feature Representation All of the suggested architectures start with an embedding layer that uses a pretrained word2vec model to transfer each tweet, which is represented as a series of integer indices, to a 300-dimension vector space. Using the Continuous Bag of Words (CBOW) training approach, we created our own word2vec model. We used a portion of the data we had gathered as a training collection. 536,000 tokens and 17.6 million tweets in all were included in the dataset. All tweets were preprocessed before to training using the same preprocessing techniques outlined above. Given that tweets are often short, we chose a window size of 3 50 for the model's hyperparameters. We left the remaining hyper-parameters at their default values and set the vector dimensions to 300. In order to get the tweets ready for the embedding layer, we first built a vocabulary index using word frequency, where each distinct word in our dataset was given a distinct integer value (0 was reserved for padding). In order to make sure that every sequence had the length of the longest tweet in the dataset, we first turned each tweet into a series of integer indexes and padded them. Sequences from the tweets were then sent into an embedding layer, which translated word indexes to previously learned word embeddings. An (m 300) tweet matrix, where m is the length of the longest tweet in the dataset, is what the embedding layer's output looked like. 6.1.2.2 CNN Architecture The first model we assessed was a CNN model for text categorization that was based on Kim's work [39]. Our CNN architecture, as shown in Figure 1, is composed of five layers: the input layer (also known as the embedding layer), the convolution layer, which is essentially made up of three parallel convolution layers with different kernel sizes (2, 3, and 4), the pooling layer, the hidden dense layer, and the output layer. As we previously discussed, the embedding layer maps every tweet into 300-dimensional real valued vectors. The embedding layer then sends an input feature matrix with a shape of m x 300 to a dropout layer with a rate of 0.5, where m is the length of the longest tweet. Dropout layer is primarily used to lessen the issue of overfitting. The three parallel convolution layers then receive the output. A rectified linear unit function for activation and 100 filters with kernel sizes of 2, 3, and 4 are present in each layer. Each layer applies the 100 filters as part of the convolution process, creating 100 new feature maps with a size of (m - (kernel size - 1) x100, which are then fed into a dropout layer with a rate of 0.5. These convolutional features are then fed into a max pooling layer (global), which performs down sampling and generates 3 (1x100) vector outputs. These three vectors are then combined and given as input to a dropout layer with a 0.5 dropout rate, which is followed by a dense layer that is completely linked and contains 50 neurons. The final predictions are then generated by feeding the output to the output layer with sigmoid activation. The summarization of the hyper-parameters of the CNN model is displayed in table 8. The architecture of the CNN is demonstrated in figure 22. 51 Table 8: Summarize The Hyper-Parameters Of The CNN Model Hyper-parameters Value Number of filters 64 Filter size 3 Dropout layer rate 0.2 Learning rate 0.001 Number of epochs 10 Batch size 50 convolutional layer Fully connected layer with 2,3,4 kernel sizes with 100 filter for each kernel ... Dim=m x 300 Tweet Matrix ... ... ... ... ... ... ... Figure 22: Convolutional neural network (CNN) architecture. 52 Dropout layer 0.5 and Max-pooling Dim=100 Dim=100 Dim=100 Concatenation Dim=300 Dropout layer 0.5 Sigmoid function 6.1.2.3 Evaluation metrics The evaluation done by applying the equations 1, 2 and 3 in section 5.4. 6.1.2.4 LSTM Architecture The table below displays the hyper-parameters for the LSTM model. For more details about the architecture of the LSTM refer to figure 21 in section 5.3. Table 9: hyper-parameters for the LSTM model KJK Hyper-parameters Value Number of hidden unites (RNN) 16 Dropout layer rate 0.5 Learning rate 0.001 Number of epochs 10 Batch size 50 Table 10: Hyper-parameters for CNN-LSTM Hyper-parameters Value Number of filters 64 kernel size 5 Number of LSTM unites 64 Dropout layer rate 0.5 Learning rate 0.001 Number of epochs 10 Batch size 128 pool size 2 53 6.2 Experiments Results The outcomes of two different sets of experiments are covered in this section. First up are the in-domain experiments, where every model was developed using the training set of the dataset we built and evaluated using the dataset's testing set. 6.2.1 Results and Discussion Table 11 lists the outcomes from our tested models for precision, recall, F1-score, accuracy, and cyberbullying recall: CNN, LSTM and CNN + LSTM for the 2- classes classifications. We have some interesting findings. Figure 23 demonstrates the results of CNN model that have been tested. For CB and Not_CB, the figures indicate the same accurate performance patterns measured. Regarding the precision (p), recall and F1, LSTM presented best performance with 96.44%, 97.03% and 96.73%, respectively. Similarly, according to the accuracy, the graph in figure 24 demonstrates that RNN model led to better performance with 94.7% than using CNN or combining the CNN with RNN model. Figure 23: Results of 2 classes classification 54 Table 11: performance for the 2-classes classification 2 Classes Model categories P R F1 A CB 0.951 0.970 0.961 CNN 0.947 Not_CB 0.936 0.898 0.919 CB 0.9644 0.9703 0.9673 LSTM 0.9559 Not_CB 0.9379 0.9261 0.9320 CB 0.951 0.970 0.961 CNN-LSTM 0.899 Not_CB 0.92 0.88 0.90 Figure 24: Accuracy on 2 Classes Classification 55 For the 6 classes of classification, table 12, table 13 and table 14 depict the results from our tested models CNN, LSTM, and CNN + LSTM, sequency. The tables represent the whole 6 classes for precision, recall, F1-score, accuracy, and losses. We have discovered some intriguing things. The tested outcomes of the CNN, LSTM, and CNN + LSTM models are shown in figure 25, figure 26 and figure 27, frequently. The figures show the same precise performance patterns measured for ACB, LCB, NCB, PSCB, RCB and SCB. For CNN model, LCB demonstrated the best performance in terms of precision (p) with 89%, while PSCB showed belter performance for recall (r) with 89%. Repeatedly, LCB gained achieved best results in the recall metrics with 89%. For the F1-score, ACB presented the highest readings with 86%. Similarly, according to LSTM, LCB showed best results with 93% in the (p), while PSCB showed 88%, and LCB presented 89% in F1-sccore. The worst results obtained among the whole classes was from SCB. For the accuracy and the losses in table 13, they are 0.783 and 0.9236, respectively. Table 12: Results Of CNN On The 6-Classes Classification 6 Classes Category Model P R F1 A Loss ACB 0.87 0.86 0.86 LCB 0.89 0.86 0.87 NCB 0.78 0.69 0.73 CNN 0.7875 0.984 PSCB 0.78 0.89 0.83 RCB 0.42 0.34 0.38 SCB 0.16 0.13 0.14 56 Figure 25: Results of implementing CNN on 6 classes classification Table 13: Performance Of LSTM On 6 Classes Classification Model 6 Classes Category P R F1 A Loss ACB 0.86 0.86 0.86 LCB 0.93 0.85 0.89 LSTM NCB 0.75 0.71 0.73 0.783 0.9236 PSCB 0.78 0.88 0.83 RCB 0.37 0.23 0.28 SCB 0.09 0.04 0.06 57 Figure 26: Results of LSTM on 6 Classes Classification The obtained results in table 14 and figure 29 show performance of the CNN-LSTM model for the 6 classes classification. Category LCB proposed the best results for (p) with 91%, while ACB achieved 90% in recall metrics comparing with the equality in ACB and LCB with 87% for the F1-sccore. The accuracy of the CNN-LSTM is 89.9% while the losses are 1.026. Figure 28 depicts the Accuracy in CNN, LSTM and CNN-LSTM for the 6 classes classification. It is noticeable that CNN achieved the highest accuracy with 78.75% compared with LSTM 78.3% and CNN-LSTM 77.9%. Regarding the consumed time, CNN Figure 27:Results of CNN-LSTM on 6 Classes Classification 58 spent less time in executing the code compared with LSTM. The worst results obtained from SCB and can be improved by improving the datasets. The results proved that we achieved the target of the thesis where the experiments answered the questions of the thesis. Table 14: Performance Of The CNN-LSTM on the 6 Classes Classification Model 6 Classes Category P R F1 A Loss ACB 0.84 0.90 0.87 LCB 0.91 0.83 0.87 CNN-LSTM NCB 0.75 0.71 0.73 0.779 1.026 PSCB 0.84 0.82 0.83 RCB 0.38 0.36 0.37 SCB 0.14 0.26 0.18 Figure 28: the Accuracy in 6 Classes Classification 59 Figure 29: Accuracy in both 2 Classes and 6- Classes classifications 60 7 CHAPTER SEVEN: CONCLUSION AND FUTURE WORK The major goals of smart cities are to raise the standard of living of its residents, promote economic expansion while ensuring that development is sustainable, and effectively provide essential public services. Events of a social, economic, or political character prompt a major and immediate public response in a quick-paced, hyper-connected society. As it may be used to anticipate public reactions, public opinion is a form of intelligence that is very important to governments. To effectively and sustainably take action, it is vital to gather and analyze public opinion. Deep learning models have been successfully used to identify and stop Arabic cyberbullying, with encouraging outcomes. Cyberbullying is an issue that has gotten worse and needs creative solutions as social media usage has grown in the Arab world. Arabic text and speech have been used to recognize and categorize cyberbullying content, and deep learning models like convolutional neural networks and recurrent neural networks have proven to be successful in doing so. It is crucial to remember that the creation of these models necessitates a sizable amount of labeled data, which might be challenging to come by for Arabic cyberbullying. Accurate model development is further complicated by the cultural and linguistic idiosyncrasies found in Arabic text and voice. Hence, to increase the efficiency and dependability of deep learning models for identifying and preventing Arabic cyberbullying, constant study and collaboration between linguists, psychologists, and machine learning specialists are required. Due to the growing issue of cyberbullying speech on Arabic Twitter, an efficient automatic cyberbullying detection solution is urgently needed. The methodology suggested in this paper provides real-time sentiment intelligence reports by combining a number of approaches, including automatic data extraction, cyberbullying recognition, sentiment annotation tools, and a hybrid approach to text categorization. In this study, we created a public dataset of about 30,000 tweets to specify the effectiveness of the proposed method. The dataset had been classified into 2 cases, where the first case contains 2-classes classification labeled as Cyberbullying (CB) and Not_Cyberbullying (Not_CB), and the second case consists of 6-classes labeled as: Sexual CyberBullying (SCB), Animal phrase cyberbullying (ACB), Looking cyberbullying (LCB), Religious cyberbullying (RCB), 61 Psychological Cyberbullying (PSB) and Not-Cyberbullying (NCB). Three distinct models— CNN, LSTM and CNN + LSTM—were assessed and contrasted. The outcomes of our research were encouraging and demonstrated the utility of the suggested models for the detection job. For the 2-classes classification, with 78.7% accuracy, the data demonstrated that CNN outperformed other models. Meanwhile, LSTM proved superior performance with 95.59% in the 6-classes classification. As a future work, we think there are a number of opportunities to expand and enhance this research. We intend to expand the sorts of data analysis currently being done, such as investigating how attitudes change over time or vary across geographic borders. The dataset may be expanded to include other topics, writing patterns, and writing styles. To improve the detection job outcomes beyond binary classification, the dataset can additionally be annotated with multiple labels. Additionally, we want to concentrate on identifying extreme viewpoints that might serve as a springboard for violent behavior. Lastly, it is planned to enhance the capabilities of our solution by adding support for images and dialect analysis. 62 REFERENCES [1] M. A. Al-Ajlan and M. Ykhlef, “Optimized Twitter Cyberbullying Detection based on Deep Learning,” in 2018 21st Saudi Computer Society National Computer Conference (NCC), Riyadh, Apr. 2018, pp. 1–5. doi: 10.1109/NCG.2018.8593146. [2] D. Mouheb, R. Albarghash, M. F. Mowakeh, Z. Al Aghbari, and I. Kamel, “Detection of Arabic cyberbullying on social networks using machine learning,” in 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA), 2019, pp. 1–5. [3] D. Mouheb, R. Albarghash, M. F. Mowakeh, Z. A. Aghbari, and I. Kamel, “Detection of Arabic Cyberbullying on Social Networks using Machine Learning,” in 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA), Abu Dhabi, United Arab Emirates, Nov. 2019, pp. 1–5. doi: 10.1109/AICCSA47632.2019.9035276. [4] R. R. Dalvi, S. Baliram Chavan, and A. Halbe, “Detecting A Twitter Cyberbullying Using Machine Learning,” in 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), May 2020, pp. 297–301. doi: 10.1109/ICICCS48265.2020.9120893. [5] V. Nandakumar, B. C. Kovoor, and M. U. Sreeja, “Cyberbullying Revelation In Twitter Data Using Naïve Bayes Classifier Algorithm.,” International Journal of Advanced Research in Computer Science, vol. 9, no. 1, 2018. [6] A. S. Alqarafi, A. Adeel, M. Gogate, K. Dashitpour, A. Hussain, and T. Durrani, “Toward’s Arabic multi-modal sentiment analysis,” Communications, Signal Processing, and Systems, Jun. 2018, doi: 10.1007/978-981-10-6571-2_290. [7] H. AL-Rubaiee, R. Qiu, K. Alomar, and D. Li, “Techniques for Improving the Labelling Process of Sentiment Analysis in the Saudi Stock Market,” International Journal of Advanced Computer Science and Applications (IJACSA), vol. 9, no. 3, Art. no. 3, 30 2018, doi: 10.14569/IJACSA.2018.090307. [8] R. Baly et al., “OMAM at SemEval-2017 Task 4: Evaluation of English State-of-the-Art Sentiment Analysis Models for Arabic and a New Topic-based Model,” in Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, Aug. 2017, pp. 603–610. doi: 10.18653/v1/S17-2099. [9] R. Baly et al., “A characterization study of arabic twitter data with a benchmarking for state-of-the-art opinion mining models,” in Proceedings of the third Arabic natural language processing workshop, 2017, pp. 110–118. [10] M. Gridach, H. Haddad, and H. Mulki, “Empirical Evaluation of Word Representations on Arabic Sentiment Analysis,” in Arabic Language Processing: From Theory to Practice, vol. 782, A. Lachkar, K. Bouzoubaa, A. Mazroui, A. Hamdani, and A. Lekhouaja, Eds. Cham: Springer International Publishing, 2018, pp. 147–158. doi: 10.1007/978-3-319-73500-9_11. 63 [11] K. Alomari, H. Elsherif, and K. Shaalan, “Arabic Tweets Sentimental Analysis Using Machine Learning,” 2017, pp. 602–610. doi: 10.1007/978-3-319-60042-0_66. [12] S. R. El-Beltagy, T. Khalil, A. Halaby, and M. Hammad, “Combining Lexical Features and a Supervised Learning Approach for Arabic Sentiment Analysis,” in Computational Linguistics and Intelligent Text Processing, Cham, 2018, pp. 307–319. doi: 10.1007/978-3-319-75487-1_24. [13] A. Elnagar, Y. S. Khalifa, and A. Einea, “Hotel Arabic-Reviews Dataset Construction for Sentiment Analysis Applications,” in Intelligent Natural Language Processing: Trends and Applications, K. Shaalan, A. E. Hassanien, and F. Tolba, Eds. Cham: Springer International Publishing, 2018, pp. 35–52. doi: 10.1007/978-3-319- 67056-0_3. [14] B. Haidar, M. Chamoun, and A. Serhrouchni, “Arabic Cyberbullying Detection: Using Deep Learning,” in 2018 7th International Conference on Computer and Communication Engineering (ICCCE), Sep. 2018, pp. 284–289. doi: 10.1109/ICCCE.2018.8539303. [15] R. T. Ali and M.-B. Kurdy, “Cyberbullying Detection in Syrian Slang on Social Media by using Data Mining”. [16] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching Word Vectors with Subword Information.” arXiv, Jun. 19, 2017. doi: 10.48550/arXiv.1607.04606. [17] H. Haddad, H. Mulki, and A. Oueslati, “T-HSAB: A Tunisian Hate Speech and Abusive Dataset,” in Arabic Language Processing: From Theory to Practice, Cham, 2019, pp. 251–263. doi: 10.1007/978-3-030-32959-4_18. [18] H. Mulki, H. Haddad, C. Bechikh Ali, and H. Alshabani, “L-HSAB: A Levantine Twitter Dataset for Hate Speech and Abusive Language,” in Proceedings of the Third Workshop on Abusive Language Online, Florence, Italy, Aug. 2019, pp. 111–118. doi: 10.18653/v1/W19-3512. [19] H. Mubarak, A. Rashed, K. Darwish, Y. Samih, and A. Abdelali, “Arabic Offensive Language on Twitter: Analysis and Experiments,” in Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine (Virtual), Apr. 2021, pp. 126– 135. Accessed: Feb. 24, 2023. [Online]. Available: https://aclanthology.org/2021.wanlp- 1.13 [20] A. Safaya, M. Abdullatif, and D. Yuret, “KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media,” in Proceedings of the Fourteenth Workshop on Semantic Evaluation, Barcelona (online), Dec. 2020, pp. 2054– 2059. doi: 10.18653/v1/2020.semeval-1.271. [21] L. Cheng, R. Guo, Y. Silva, D. Hall, and H. Liu, “Hierarchical attention networks for cyberbullying detection on the instagram social network,” in Proceedings of the 2019 SIAM international conference on data mining, 2019, pp. 235–243. [22] S. Agrawal and A. Awekar, “Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms,” in Advances in Information Retrieval, vol. 10772, G. 64 Pasi, B. Piwowarski, L. Azzopardi, and A. Hanbury, Eds. Cham: Springer International Publishing, 2018, pp. 141–153. doi: 10.1007/978-3-319-76941-7_11. [23] P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, “Deep Learning for Hate Speech Detection in Tweets,” in Proceedings of the 26th International Conference on World Wide Web Companion - WWW ’17 Companion, Perth, Australia, 2017, pp. 759–760. doi: 10.1145/3041021.3054223. [24] A. Arango, J. Pérez, and B. Poblete, “Hate speech detection is not as easy as you may think: A closer look at model validation (extended version),” Information Systems, vol. 105, p. 101584, Mar. 2022, doi: 10.1016/j.is.2020.101584. [25] A. S. Saksesi, M. Nasrun, and C. Setianingsih, “Analysis Text of Hate Speech Detection Using Recurrent Neural Network,” in 2018 International Conference on Control, Electronics, Renewable Energy and Communications (ICCEREC), Bandung, Indonesia, Dec. 2018, pp. 242–248. doi: 10.1109/ICCEREC.2018.8712104. [26] S. Biere, “Hate Speech Detection Using Natural Language Processing Techniques”. [27] “Cyberbullying Research Center - How to Identify, Prevent, and Respond,” Cyberbullying Research Center. https://cyberbullying.org/ (accessed Jan. 17, 2023). [28] S. R and S. Pk, “Cyberbullying: another main type of bullying?,” Scandinavian journal of psychology, vol. 49, no. 2, Apr. 2008, doi: 10.1111/j.1467- 9450.2007.00611.x. [29]: : “STOP Cyberbullying - New Website Launching Soon!! ::” http://www.stopcyberbullying.org/ (accessed Jan. 17, 2023). [30] V. Šléglová and A. Cerna, “Cyberbullying in adolescent victims: Perception and coping,” Cyberpsychology: journal of psychosocial research on cyberspace, vol. 5, no. 2, 2011. [31] B. Pang and L. Lee, “Opinion mining and sentiment analysis,” Foundations and Trends® in information retrieval, vol. 2, no. 1–2, pp. 1–135, 2008. [32] A. Shelar and C.-Y. Huang, “Sentiment analysis of twitter data,” in 2018 International Conference on Computational Science and Computational Intelligence (CSCI), 2018, pp. 1301–1302. [33] T. Mahlangu, C. Tu, and P. Owolawi, “A review of automated detection methods for cyberbullying,” in 2018 International Conference on Intelligent and Innovative Computing Applications (ICONIC), 2018, pp. 1–5. [34] M. A. Al-Garadi, K. D. Varathan, and S. D. Ravana, “Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network,” Computers in Human Behavior, vol. 63, pp. 433–443, 2016. [35] M. Munezero, M. Mozgovoy, T. Kakkonen, V. Klyuev, and E. Sutinen, “Antisocial behavior corpus for harmful language detection,” in 2013 Federated Conference on Computer Science and Information Systems, 2013, pp. 261–265. 65 [36] H.-T. Kao, S. Yan, D. Huang, N. Bartley, H. Hosseinmardi, and E. Ferrara, “Understanding cyberbullying on Instagram and Ask. Fm via social role detection,” in Companion Proceedings of The 2019 World Wide Web Conference, 2019, pp. 183–188. [37] I. Nazar, D.-S. Zois, and M. Yao, “A hierarchical approach for timely cyberbullying detection,” in 2019 IEEE Data Science Workshop (DSW), 2019, pp. 190–195. [38] E. Wulczyn, N. Thain, and L. Dixon, “Ex machina: Personal attacks seen at scale,” in Proceedings of the 26th international conference on world wide web, 2017, pp. 1391– 1399. [39] J. Zhao, K. Liu, and L. Xu, “Sentiment analysis: mining opinions, sentiments, and emotions.” MIT Press One Rogers Street, Cambridge, MA 02142-1209, USA journals- info …, 2016. [40] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algorithms and applications: A survey,” Ain Shams engineering journal, vol. 5, no. 4, pp. 1093–1113, 2014. [41] J.-M. Xu, X. Zhu, and A. Bellmore, “Fast learning for sentiment analysis on bullying,” in Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining, 2012, pp. 1–6. [42] H. Dani, J. Li, and H. Liu, “Sentiment informed cyberbullying detection in social media,” in Joint European conference on machine learning and knowledge discovery in databases, 2017, pp. 52–67. [43] V. Nahar, S. Al-Maskari, X. Li, and C. Pang, “Semi-supervised learning for cyberbullying detection in social networks,” in Australasian Database Conference, 2014, pp. 160–171. [44] A. Seyeditabari, N. Tabari, and W. Zadrozny, “Emotion Detection in Text: a Review.” arXiv, Jun. 02, 2018. doi: 10.48550/arXiv.1806.00674. [45] S. M. Mohammad, “From once upon a time to happily ever after: Tracking emotions in mail and books,” Decision Support Systems, vol. 53, no. 4, pp. 730–741, 2012. [46] M. J. Vioules, B. Moulahi, J. Azé, and S. Bringay, “Detection of suicide-related posts in Twitter data streams,” IBM Journal of Research and Development, vol. 62, no. 1, pp. 7–1, 2018. [47] J. Pestian, H. Nasrallah, P. Matykiewicz, A. Bennett, and A. Leenaars, “Suicide note classification using natural language processing: A content analysis,” Biomedical informatics insights, vol. 3, p. BII-S4706, 2010. [48] J. A. Patch, “DETECTING BULLYING ON TWITTER USING EMOTION LEXICONS”. [49] A. Farghaly and K. Shaalan, “Arabic natural language processing: Challenges and solutions,” ACM Transactions on Asian Language Information Processing (TALIP), vol. 8, no. 4, pp. 1–22, 2009. 66 [50] A. Abdelali, J. Cowie, and H. S. Soliman, “Arabic Information Retrieval Perspectives”. [51] N. G. Ali and N. Omar, “Arabic keyphrases extraction using a hybrid of statistical and machine learning methods,” in Proceedings of the 6th International Conference on Information Technology and Multimedia, 2014, pp. 281–286. [52] G. J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition. John Wiley & Sons, 2005. [53] T. Haifley, “Linear logistic regression: an introduction,” in IEEE International Integrated Reliability Workshop Final Report, 2002., 2002, pp. 184–187. [54] K. Shaalan and H. Raza, “Arabic named entity recognition from diverse text types,” in International conference on natural language processing, 2008, pp. 440–451. [55] A. El-Halees, “Filtering Spam E-Mail from Mixed Arabic and English Messages: A Comparison of Machine Learning Techniques,” vol. 6, no. 1, 2009. [56] T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, Jan. 1967, doi: 10.1109/TIT.1967.1053964. [57] N. El-Mawass and S. Alaboodi, “Detecting Arabic spammers and content polluters on Twitter,” in 2016 Sixth International Conference on Digital Information Processing and Communications (ICDIPC), 2016, pp. 53–58. [58] R. M. Duwairi, M. Alfaqeh, M. Wardat, and A. Alrabadi, “Sentiment analysis for Arabizi text,” in 2016 7th International Conference on Information and Communication Systems (ICICS), 2016, pp. 127–132. [59] “(PDF) Sentiment Analyzer for Arabic Comments System.” https://www.researchgate.net/publication/291748065_Sentiment_Analyzer_for_Arabic _Comments_System (accessed Jan. 21, 2023). [60] R. M. Duwairi, R. Marji, N. Sha’ban, and S. Rushaidat, “Sentiment Analysis in Arabic tweets,” in 2014 5th International Conference on Information and Communication Systems (ICICS), Apr. 2014, pp. 1–6. doi: 10.1109/IACS.2014.6841964. [61] R. M. Duwairi, “Sentiment analysis for dialectical Arabic,” in 2015 6th international conference on information and communication systems (ICICS), 2015, pp. 166–170. [62] S. Alhumoud, T. Albuhairi, and M. Altuwaijri, “Arabic sentiment analysis using WEKA a hybrid learning approach,” in 2015 7th international joint conference on knowledge discovery, knowledge engineering and knowledge management (IC3K), 2015, vol. 1, pp. 402–408. [63] A. Al-Zyoud and W. A. Al-Rabayah, “Arabic stemming techniques: comparisons and new vision,” in 2015 IEEE 8th GCC Conference & Exhibition, 2015, pp. 1–6. 67 [64] L. S. Larkey, L. Ballesteros, and M. E. Connell, “Light Stemming for Arabic Information Retrieval,” in Arabic Computational Morphology: Knowledge-based and Empirical Methods, A. Soudi, A. van den Bosch, and G. Neumann, Eds. Dordrecht: Springer Netherlands, 2007, pp. 221–243. doi: 10.1007/978-1-4020-6046-5_12. [65] “Bullying and Emotional Abuse in the Workplace: International Perspectives in Research and Practice - ProQuest.” https://www.proquest.com/openview/a9dad329b4fee4a986a040dbba2b5335/1?pq- origsite=gscholar&cbl=36693 (accessed Feb. 13, 2023). [66] M. Dadvar and F. De Jong, “Cyberbullying detection: a step toward a safer internet yard,” in Proceedings of the 21st International Conference on World Wide Web, 2012, pp. 121–126. [67] J. Escartín, D. Zapf, C. Arrieta, and Á. Rodríguez-Carballeira, “Workers’ perception of workplace bullying: A cross-cultural study,” European Journal of Work and Organizational Psychology, vol. 20, no. 2, pp. 178–205, Apr. 2011, doi: 10.1080/13594320903395652. 68