A Comparative Analysis of Deep Learning Models for Facial Expression Recognition at COP28
Loading...
Date
2024-07
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
The British University in Dubai (BUiD)
Abstract
Facial expression recognition (FER) has primarily been studied in controlled environments, which do not adequately represent the dynamic conditions of real-world events. This study aims to bridge this gap by conducting a comparative analysis of deep learning models tailored to the context of COP28, a large-scale public event in the UAE. The research evaluated several deep learning architectures, including the standard convolutional neural network (CNN), expanded CNN, multi-path CNN, pre-trained transfer learning Visual Geometry Group 16 (VGG16), recurrent neural network (RNN), long short-term memory (LSTM), gat-ed recurrent unit (GRU), and vision transformer (ViT) ones, using the Karolinska Directed Emotional Faces (KDEF) dataset for training and validation. Performance was then tested on unseen images from COP28. The findings revealed that CNN-based models, particularly the multi-path CNN and VGG16, outperformed sequential models such as the RNN, LSTM, and GRU, with the multi-path CNN achieving the highest accuracy (71.70%) on COP28 test im-ages. The expanded CNN showed a strong performance on the KDEF dataset, achieving a training accuracy of 97.7%. Although most models performed exceptionally well in con-trolled settings, their performance varied when applied to COP28, indicating areas for improvement in generalization. The limitations included dataset constraints and performance variability across different expressions. Future research should focus on diverse datasets, advanced data augmentation, ensemble methods, newer architectures, multimodal approaches, and understanding cultural impacts on FER while considering ethical implications. This work underscores the importance of adapting FER models to real-world scenarios and provides insights for improving FER accuracy and generalization.