1. Enhancing Visual Question Answering for Arabic Language Using LLaVa and Reinforcement Learning.
- Author
-
ElMaghraby, Asmaa, Maged, Samaa, Essawey, Mohamed, ElFaramawy, Rawan, Negm, Esraa, and Khoriba, Ghada
- Subjects
LANGUAGE models ,REINFORCEMENT learning ,DEEP learning ,ARABIC language ,LEARNING ,CHATBOTS - Abstract
Visual Question Answering (VQA) systems have achieved remarkable advancements by combining text-based question answering with image analysis. This integration has resulted in the creating of machines that can comprehend and address questions related to visual content. Despite these technological developments, a notable lack of VQA solutions specifically designed for the Arabic language remains. This gap persists even with the significant progress made in deep learning techniques and the development of Large Language models (LLMs). Our research introduces ArabicQuest, an innovative chatbot designed specifically for Arabic-speaking users. ArabicQuest utilizes the LLaVa Large Language Model, a dedicated translation model, and reinforcement learning from human feedback (RLHF) to effectively integrate Arabic text with visual data. Through Telegram's application API, ArabicQuest offers a seamless user interface to ask image-based questions. The proposed pipeline continuously improves its accuracy and relevance by incorporating user feedback, achieving an accuracy of 86%. ArabicQuest is trained and evaluated on various datasets, including Visual Genome, RSVQA, and ChartsQA, to ensure robust performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF