A systematic review of Arabic text classification: areas, applications, and future directions

Date
2023
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Abstract Text classification pertains to the automated procedure of assigning predefined labels or categories to textual data. A comprehensive review of the existing literature on Arabic text classification (ATC) reveals that most research concentrates on methodologies and approaches, with no thorough evaluation of ATC. Consequently, this systematic review aims to offer a comprehensive understanding of the state-of-the-art in ATC, illuminate the present challenges, and discuss prominent trends in large-scale research. From a collection of 2875 studies, 60 were determined to satisfy the eligibility criteria and were rigorously analyzed. The selected studies were divided into three categories: topic areas, tasks/applications, and ATC phases. The topic areas were classified into six primary sectors: healthcare, legal, security and cybersecurity, history, culture and religion, social media, and agriculture. The ATC tasks/applications were classified into nine groups: gender identification, author identification, disease detection, threat and spam detection, dialect identification, hierarchical cate gorization, news article classification, web page clustering, and question classification. The ATC phases were organized into five categories: corpus creation, preprocessing (stemming and tokenization), feature selection, feature extraction, and classifiers/approaches. The review emphasizes the proposed solutions in each ATC study and offers insights for future research. This review also underscores the potential applications of ATC in addressing current challenges across various industries and highlights the significance of developing a benchmark dataset for ATC to facilitate model comparison. The review concludes by proposing areas where further research is required, such as addressing the unbalanced dataset issue, enhancing the preprocessing phase, and exploring human factors’ role in utilizing ATC systems.
Description
Keywords
Citation