Deep Learning for Arabic Image Captioning: A Comparative Study of Main Factors and Preprocessing Recommendations

dc.contributor.authorHejazi, Hani
dc.contributor.authorShaalan, Khaled
dc.date.accessioned2025-05-14T09:59:06Z
dc.date.available2025-05-14T09:59:06Z
dc.date.issued2021
dc.description.sponsorshipCaptioning of images has been a major concern for the last decade, with most of the efforts aimed at English captioning. Due to the lack of work done for Arabic, relying on translation as an alternative to creating Arabic captions will lead to accumulating errors during translation and caption prediction. When working with Arabic datasets, preprocessing is crucial, and handling Arabic morphological features such as Nunation requires additional steps. We tested 32 different variables combinations that affect caption generation, including preprocessing, deep learning techniques (LSTM and GRU), dropout, and features extraction (Inception V3, VGG16). Moreover, our results on the only publicly avail-able Arabic Dataset outperform the best result with BLEU-1=36.5, BLEU 2=21.4, BLEU-3=12 and BLEU4=6.6. As a result of this study, we demonstrated that using Arabic preprocessing and VGG16 image features extraction enhanced Arabic caption quality, but we saw no measurable difference when using Dropout or LSTM instead of GRU
dc.identifier.citationHejazi, H. and Shaalan, K. (2021) “Deep Learning for Arabic Image Captioning: A Comparative Study of Main Factors and Preprocessing Recommendations,” International Journal of Advanced Computer Science and Applications, 12(11), p. n/a.
dc.identifier.doihttps://doi.org/10.14569/IJACSA.2021.0121105.
dc.identifier.issn2158-107X
dc.identifier.urihttps://bspace.buid.ac.ae/handle/1234/3019
dc.language.isoen
dc.publisherDeep learning; NLP; Arabic image captioning; Arabic text preprocessing; LSTM; VGG16; INCEPTION V3
dc.relation.ispartofseriesInternational Journal of Advanced Computer Science and Applicationsv12 n11 (2021): n/a
dc.subjectDeep learning; NLP; Arabic image captioning; Arabic text preprocessing; LSTM; VGG16; INCEPTION V3
dc.titleDeep Learning for Arabic Image Captioning: A Comparative Study of Main Factors and Preprocessing Recommendations
dc.typeArticle
Files
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.35 KB
Format:
Item-specific license agreed upon to submission
Description: