1. Recognize after early fusion: the Chinese food recognition based on the alignment of image and ingredients.
- Author
-
Zhang, Ruoxuan, Ouyang, Dantong, He, Lili, Kuang, Lingjin, and Bai, Hongtao
- Abstract
As concerns about health continue to grow, more and more works are being done in the field of food computing. One of the basic topics in food computing is how to extract important information from food and analysis it from a picture. However, food recognition poses some challenges. One challenge is that the type of food is closely related to its ingredients. Another challenge is that in Chinese dietary habits, a single meal typically includes multiple dishes. But existing food image datasets only contain single-food pictures. To address these challenges, we propose our model, Recognize After Early Fusion (RAEF): the Chinese food recognition based on the alignment of image and ingredients. We use a Vision Transformer as the backbone of our model and use an early fusion model to combine visual and ingredient features. Because there are no suitable datasets for multi-label food recognition models, we propose a new Chinese food dataset named Chinsefood-130. The dataset is in password: mr2b. After conducting experiments, we found that RAEF has great performance in both food and ingredient recognition. Compared to the performance of ViT, RAEF shows an F1 score improvement of 10% on food recognition and 12% on ingredient recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF