Back to Search
Start Over
Vision-Enabled Large Language and Deep Learning Models for Image-Based Emotion Recognition.
- Source :
- Cognitive Computation; Sep2024, Vol. 16 Issue 5, p2566-2579, 14p
- Publication Year :
- 2024
-
Abstract
- The significant advancements in the capabilities, reasoning, and efficiency of artificial intelligence (AI)-based tools and systems are evident. Some noteworthy examples of such tools include generative AI-based large language models (LLMs) such as generative pretrained transformer 3.5 (GPT 3.5), generative pretrained transformer 4 (GPT-4), and Bard. LLMs are versatile and effective for various tasks such as composing poetry, writing codes, generating essays, and solving puzzles. Thus far, LLMs can only effectively process text-based input. However, recent advancements have enabled them to handle multimodal inputs, such as text, images, and audio, making them highly general-purpose tools. LLMs have achieved decent performance in pattern recognition tasks (such as classification), therefore, there is a curiosity about whether general-purpose LLMs can perform comparable or even superior to specialized deep learning models (DLMs) trained specifically for a given task. In this study, we compared the performances of fine-tuned DLMs with those of general-purpose LLMs for image-based emotion recognition. We trained DLMs, namely, a convolutional neural network (CNN) (two CNN models were used: C N N 1 and C N N 2 ), ResNet50, and VGG-16 models, using an image dataset for emotion recognition, and then tested their performance on another dataset. Subsequently, we subjected the same testing dataset to two vision-enabled LLMs (LLaVa and GPT-4). The C N N 2 was found to be the superior model with an accuracy of 62% while VGG16 produced the lowest accuracy with 31%. In the category of LLMs, GPT-4 performed the best, with an accuracy of 55.81%. LLava LLM had a higher accuracy than C N N 1 and VGG16 models. The other performance metrics such as precision, recall, and F1-score followed similar trends. However, GPT-4 performed the best with small datasets. The poor results observed in LLMs can be attributed to their general-purpose nature, which, despite extensive pretraining, may not fully capture the features required for specific tasks like emotion recognition in images as effectively as models fine-tuned for those tasks. The LLMs did not surpass specialized models but achieved comparable performance, making them a viable option for specific tasks without additional training. In addition, LLMs can be considered a good alternative when the available dataset is small. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 18669956
- Volume :
- 16
- Issue :
- 5
- Database :
- Complementary Index
- Journal :
- Cognitive Computation
- Publication Type :
- Academic Journal
- Accession number :
- 178877051
- Full Text :
- https://doi.org/10.1007/s12559-024-10281-5