A New English/Arabic Parallel Corpus for Phishing Emails

dc.contributor.authorSALLOUM, SAID
dc.contributor.authorGABER, TAREK
dc.contributor.authorVADERA, SUNIL
dc.contributor.authorSHAALAN, KHALED
dc.date.accessioned2025-02-11T04:46:49Z
dc.date.available2025-02-11T04:46:49Z
dc.date.issued2023
dc.description.abstractPhishing involves malicious activity whereby phishers, in the disguise of legitimate entities, obtain illegit imate access to the victims’ personal and private information, usually through emails. Currently, phishing attacks and threats are being handled effectively through the use of the latest phishing email detection so lutions. Most current phishing detection systems assume phishing attacks to be in English, though attacks in other languages are growing. In particular, Arabic is a widely used language and therefore represents a vulnerable target. However, there is a significant shortage of corpora that can be used to develop Arabic phishing detection systems. This article presents the development of a new English-Arabic parallel phishing email corpusthat has been developed from the anti-phishing share task text (IWSPA-AP 2018). The email con tent was to be translated, and the task had been allotted to 10 volunteers who had a university background and were English and Arabic language experts. To evaluate the effectiveness of the new corpus, we develop phishing email detection models using Term Frequency–Inverse Document Frequency and Multilayer Per ceptron using 1,258 emails in Arabic and English that have equal ratios of legitimate and phishing emails. The experimental findings show that the accuracy reaches 96.82% for the Arabic dataset and 94.63% for the emails in English, providing some assurance of the potential value of the parallel corpus developed.
dc.identifier.citationSalloum, S. et al. (2023) “A New English/Arabic Parallel Corpus for Phishing Emails,” ACM Transactions on Asian and Low-Resource Language Information Processing, 22(7), pp. 1–17.
dc.identifier.doihttps://doi.org/10.1145/3606031.
dc.identifier.issn2375-4699, 2375-4702
dc.identifier.urihttps://bspace.buid.ac.ae/handle/1234/2794
dc.language.isoen
dc.publisherACM digital library
dc.relation.ispartofseriesACM Transactions on Asian and Low-Resource Language Information Processingv22 n7 (20230725): 1-17
dc.subjectCCS Concepts: • Computing methodologies → Language resources; Additional Key Words and Phrases: English–Arabic Parallel Corpus, phishing emails, Multilayer Perceptron, frequency–inverse document frequency
dc.titleA New English/Arabic Parallel Corpus for Phishing Emails
dc.typeArticle

Files

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.35 KB
Format:
Item-specific license agreed upon to submission
Description: