1. Multi-food detection using a modified swin-transfomer with recursive feature pyramid network.
- Author
-
Lee, Chao-Yang, Khanum, Abida, and Kumar, Pinninti Praneeth
- Subjects
CONVOLUTIONAL neural networks ,TRANSFORMER models - Abstract
Humans need food, and the food detection system is a fascinating research topic and a complex weight loss mechanism. Eating healthy and balanced is crucial. Over the last few years, studies on food-related tasks have progressed, but the existing system deals only with single- and multi-food items by using convolution neural networks (CNN), and the recent development of Transformers in computer-vision has outperformed famous networks like ResNet 50, VGG 16, etc., but Transformer-based methods are limited in food-related tasks. For this issue, we improve Swin Transformer-based method by taking the dominance of transformer and CNN, we propose a novel multi-food detection utilizing a modified Swin-Transfomer and Recursive Feature Pyramid Network called (MFD-MST) and Swin-Transformer with spatial extraction block (STSE) as backbone to recognize multi-food item in images to improve local and structural information of image and RFP as neck to enhance This feature representation recognizes multi-food items. STSE solves transformer positional encoding, and RFP can look at feature map twice, helping the model build powerful representations. A prominent dataset, UEC-FOOD 100, Indian Food 28, UEC-FOOD 256 was widely tested to construct a detection system that can distinguish various objects in food photos and classify them into food categories. Our model's evaluation measures vary. MFD-MST outperforms Swin-Transformer by 2.7% at AP[0.50] and 3.3%, 1.4% on three food datasets respectively. Test results suggest that our system accurately detects food items. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF