Author: "Jiang, Shuqiang" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Jiang, Shuqiang"' showing total 699 results

Start Over Author "Jiang, Shuqiang"

699 results on '"Jiang, Shuqiang"'

1. Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation

Author: Wang, Zihan, Li, Xiangyang, Yang, Jiahao, Liu, Yeqi, and Jiang, Shuqiang
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision-and-language navigation (VLN) enables the agent to navigate to a remote location in 3D environments following the natural language instruction. In this field, the agent is usually trained and evaluated in the navigation simulators, lacking effective approaches for sim-to-real transfer. The VLN agents with only a monocular camera exhibit extremely limited performance, while the mainstream VLN models trained with panoramic observation, perform better but are difficult to deploy on most monocular robots. For this case, we propose a sim-to-real transfer approach to endow the monocular robots with panoramic traversability perception and panoramic semantic understanding, thus smoothly transferring the high-performance panoramic VLN models to the common monocular robots. In this work, the semantic traversable map is proposed to predict agent-centric navigable waypoints, and the novel view representations of these navigable waypoints are predicted through the 3D feature fields. These methods broaden the limited field of view of the monocular robots and significantly improve navigation performance in the real world. Our VLN system outperforms previous SOTA monocular VLN methods in R2R-CE and RxR-CE benchmarks within the simulation environments and is also validated in real-world environments, providing a practical and high-performance solution for real-world VLN., Comment: Accepted by CoRL 2024. The code is available at https://github.com/MrZihan/Sim2Real-VLN-3DFF
Published: 2024

2. FoodSky: A Food-oriented Large Language Model that Passes the Chef and Dietetic Examination

Author: Zhou, Pengfei, Min, Weiqing, Fu, Chaoran, Jin, Ying, Huang, Mingyu, Li, Xiangyang, Mei, Shuhuan, and Jiang, Shuqiang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Food is foundational to human life, serving not only as a source of nourishment but also as a cornerstone of cultural identity and social interaction. As the complexity of global dietary needs and preferences grows, food intelligence is needed to enable food perception and reasoning for various tasks, ranging from recipe generation and dietary recommendation to diet-disease correlation discovery and understanding. Towards this goal, for powerful capabilities across various domains and tasks in Large Language Models (LLMs), we introduce Food-oriented LLM FoodSky to comprehend food data through perception and reasoning. Considering the complexity and typicality of Chinese cuisine, we first construct one comprehensive Chinese food corpus FoodEarth from various authoritative sources, which can be leveraged by FoodSky to achieve deep understanding of food-related data. We then propose Topic-based Selective State Space Model (TS3M) and the Hierarchical Topic Retrieval Augmented Generation (HTRAG) mechanism to enhance FoodSky in capturing fine-grained food semantics and generating context-aware food-relevant text, respectively. Our extensive evaluations demonstrate that FoodSky significantly outperforms general-purpose LLMs in both chef and dietetic examinations, with an accuracy of 67.2% and 66.4% on the Chinese National Chef Exam and the National Dietetic Exam, respectively. FoodSky not only promises to enhance culinary creativity and promote healthier eating patterns, but also sets a new standard for domain-specific LLMs that address complex real-world issues in the food domain. An online demonstration of FoodSky is available at http://222.92.101.211:8200., Comment: 32 pages, 19 figures
Published: 2024

3. DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model

Author: Jin, Yang, Lv, Jun, Jiang, Shuqiang, and Lu, Cewu
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Generating robot demonstrations through simulation is widely recognized as an effective way to scale up robot data. Previous work often trained reinforcement learning agents to generate expert policies, but this approach lacks sample efficiency. Recently, a line of work has attempted to generate robot demonstrations via differentiable simulation, which is promising but heavily relies on reward design, a labor-intensive process. In this paper, we propose DiffGen, a novel framework that integrates differentiable physics simulation, differentiable rendering, and a vision-language model to enable automatic and efficient generation of robot demonstrations. Given a simulated robot manipulation scenario and a natural language instruction, DiffGen can generate realistic robot demonstrations by minimizing the distance between the embedding of the language instruction and the embedding of the simulated observation after manipulation. The embeddings are obtained from the vision-language model, and the optimization is achieved by calculating and descending gradients through the differentiable simulation, differentiable rendering, and vision-language model components, thereby accomplishing the specified task. Experiments demonstrate that with DiffGen, we could efficiently and effectively generate robot data with minimal human effort or training time.
Published: 2024

4. Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation

Author: Wang, Zihan, Li, Xiangyang, Yang, Jiahao, Liu, Yeqi, Hu, Junjie, Jiang, Ming, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: Vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments. At each navigation step, the agent selects from possible candidate locations and then makes the move. For better navigation planning, the lookahead exploration strategy aims to effectively evaluate the agent's next action by accurately anticipating the future environment of candidate locations. To this end, some existing works predict RGB images for future environments, while this strategy suffers from image distortion and high computational cost. To address these issues, we propose the pre-trained hierarchical neural radiance representation model (HNR) to produce multi-level semantic features for future environments, which are more robust and efficient than pixel-wise RGB reconstruction. Furthermore, with the predicted future environmental representations, our lookahead VLN model is able to construct the navigable future path tree and select the optimal path via efficient parallel evaluation. Extensive experiments on the VLN-CE datasets confirm the effectiveness of our method., Comment: Accepted by CVPR 2024. The code is available at https://github.com/MrZihan/HNR-VLN
Published: 2024

5. Synthesizing Knowledge-enhanced Features for Real-world Zero-shot Food Detection

Author: Zhou, Pengfei, Min, Weiqing, Song, Jiajun, Zhang, Yang, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Food computing brings various perspectives to computer vision like vision-based food analysis for nutrition and health. As a fundamental task in food computing, food detection needs Zero-Shot Detection (ZSD) on novel unseen food objects to support real-world scenarios, such as intelligent kitchens and smart restaurants. Therefore, we first benchmark the task of Zero-Shot Food Detection (ZSFD) by introducing FOWA dataset with rich attribute annotations. Unlike ZSD, fine-grained problems in ZSFD like inter-class similarity make synthesized features inseparable. The complexity of food semantic attributes further makes it more difficult for current ZSD methods to distinguish various food categories. To address these problems, we propose a novel framework ZSFDet to tackle fine-grained problems by exploiting the interaction between complex attributes. Specifically, we model the correlation between food categories and attributes in ZSFDet by multi-source graphs to provide prior knowledge for distinguishing fine-grained features. Within ZSFDet, Knowledge-Enhanced Feature Synthesizer (KEFS) learns knowledge representation from multiple sources (e.g., ingredients correlation from knowledge graph) via the multi-source graph fusion. Conditioned on the fusion of semantic knowledge representation, the region feature diffusion model in KEFS can generate fine-grained features for training the effective zero-shot detector. Extensive evaluations demonstrate the superior performance of our method ZSFDet on FOWA and the widely-used food dataset UECFOOD-256, with significant improvements by 1.8% and 3.7% ZSD mAP compared with the strong baseline RRFS. Further experiments on PASCAL VOC and MS COCO prove that enhancement of the semantic knowledge can also improve the performance on general ZSD. Code and dataset are available at https://github.com/LanceZPF/KEFS., Comment: 14 pages, accepted by IEEE Transactions on Image Processing (2024)
Published: 2024
Full Text: View/download PDF

6. Study on the coordinated development degree of new quality productivity and manufacturing carbon emission efficiency in provincial regions of China

Author: Zhang, Jiajun, Shan, Yongjuan, Jiang, Shuqiang, Xin, Boxiong, Miao, Yutian, and Zhang, Ying
Published: 2024
Full Text: View/download PDF

7. From Plate to Production: Artificial Intelligence in Modern Consumer-Driven Food Systems

Author: Min, Weiqing, Zhou, Pengfei, Xu, Leyi, Liu, Tao, Li, Tianhao, Huang, Mingyu, Jin, Ying, Yi, Yifan, Wen, Min, Jiang, Shuqiang, and Jain, Ramesh
Subjects: Computer Science - Computers and Society
Abstract: Global food systems confront the urgent challenge of supplying sustainable, nutritious diets in the face of escalating demands. The advent of Artificial Intelligence (AI) is bringing in a personal choice revolution, wherein AI-driven individual decisions transform food systems from dinner tables, to the farms, and back to our plates. In this context, AI algorithms refine personal dietary choices, subsequently shaping agricultural outputs, and promoting an optimized feedback loop from consumption to cultivation. Initially, we delve into AI tools and techniques spanning the food supply chain, and subsequently assess how AI subfields$\unicode{x2013}$encompassing machine learning, computer vision, and speech recognition$\unicode{x2013}$are harnessed within the AI-enabled Food System (AIFS) framework, which increasingly leverages Internet of Things, multimodal sensors and real-time data exchange. We spotlight the AIFS framework, emphasizing its fusion of AI with technologies such as digitalization, big data analytics, biotechnology, and IoT extensively used in modern food systems in every component. This paradigm shifts the conventional "farm to fork" narrative to a cyclical "consumer-driven farm to fork" model for better achieving sustainable, nutritious diets. This paper explores AI's promise and the intrinsic challenges it poses within the food domain. By championing stringent AI governance, uniform data architectures, and cross-disciplinary partnerships, we argue that AI, when synergized with consumer-centric strategies, holds the potential to steer food systems toward a sustainable trajectory. We furnish a comprehensive survey for the state-of-the-art in diverse facets of food systems, subsequently pinpointing gaps and advocating for the judicious and efficacious deployment of emergent AI methodologies.
Published: 2023

8. SeeDS: Semantic Separable Diffusion Synthesizer for Zero-shot Food Detection

Author: Zhou, Pengfei, Min, Weiqing, Zhang, Yang, Song, Jiajun, Jin, Ying, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Food detection is becoming a fundamental task in food computing that supports various multimedia applications, including food recommendation and dietary monitoring. To deal with real-world scenarios, food detection needs to localize and recognize novel food objects that are not seen during training, demanding Zero-Shot Detection (ZSD). However, the complexity of semantic attributes and intra-class feature diversity poses challenges for ZSD methods in distinguishing fine-grained food classes. To tackle this, we propose the Semantic Separable Diffusion Synthesizer (SeeDS) framework for Zero-Shot Food Detection (ZSFD). SeeDS consists of two modules: a Semantic Separable Synthesizing Module (S$^3$M) and a Region Feature Denoising Diffusion Model (RFDDM). The S$^3$M learns the disentangled semantic representation for complex food attributes from ingredients and cuisines, and synthesizes discriminative food features via enhanced semantic information. The RFDDM utilizes a novel diffusion model to generate diversified region features and enhances ZSFD via fine-grained synthesized features. Extensive experiments show the state-of-the-art ZSFD performance of our proposed method on two food datasets, ZSFooD and UECFOOD-256. Moreover, SeeDS also maintains effectiveness on general ZSD datasets, PASCAL VOC and MS COCO. The code and dataset can be found at https://github.com/LanceZPF/SeeDS., Comment: Accepted by ACM Multimedia 2023
Published: 2023
Full Text: View/download PDF

9. GridMM: Grid Memory Map for Vision-and-Language Navigation

Author: Wang, Zihan, Li, Xiangyang, Yang, Jiahao, Liu, Yeqi, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments. To represent the previously visited environment, most approaches for VLN implement memory using recurrent states, topological maps, or top-down semantic maps. In contrast to these approaches, we build the top-down egocentric and dynamically growing Grid Memory Map (i.e., GridMM) to structure the visited environment. From a global perspective, historical observations are projected into a unified grid map in a top-down view, which can better represent the spatial relations of the environment. From a local perspective, we further propose an instruction relevance aggregation method to capture fine-grained visual clues in each grid region. Extensive experiments are conducted on both the REVERIE, R2R, SOON datasets in the discrete environments, and the R2R-CE dataset in the continuous environments, showing the superiority of our proposed method., Comment: Accepted by ICCV 2023. The code is available at https://github.com/MrZihan/GridMM
Published: 2023

10. KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation

Author: Li, Xiangyang, Wang, Zihan, Yang, Jiahao, Wang, Yaowei, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision-and-language navigation (VLN) is the task to enable an embodied agent to navigate to a remote location following the natural language instruction in real scenes. Most of the previous approaches utilize the entire features or object-centric features to represent navigable candidates. However, these representations are not efficient enough for an agent to perform actions to arrive the target location. As knowledge provides crucial information which is complementary to visible content, in this paper, we propose a Knowledge Enhanced Reasoning Model (KERM) to leverage knowledge to improve agent navigation ability. Specifically, we first retrieve facts (i.e., knowledge described by language descriptions) for the navigation views based on local regions from the constructed knowledge base. The retrieved facts range from properties of a single object (e.g., color, shape) to relationships between objects (e.g., action, spatial position), providing crucial information for VLN. We further present the KERM which contains the purification, fact-aware interaction, and instruction-guided aggregation modules to integrate visual, history, instruction, and fact features. The proposed KERM can automatically select and gather crucial and relevant cues, obtaining more accurate action prediction. Experimental results on the REVERIE, R2R, and SOON datasets demonstrate the effectiveness of the proposed method., Comment: Accepted by CVPR 2023. The code is available at https://github.com/XiangyangLi20/KERM
Published: 2023

11. Deep Learning for Logo Detection: A Survey

Author: Hou, Sujuan, Li, Jiacheng, Min, Weiqing, Hou, Qiang, Zhao, Yanna, Zheng, Yuanjie, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: When logos are increasingly created, logo detection has gradually become a research hotspot across many domains and tasks. Recent advances in this area are dominated by deep learning-based solutions, where many datasets, learning strategies, network architectures, etc. have been employed. This paper reviews the advance in applying deep learning techniques to logo detection. Firstly, we discuss a comprehensive account of public datasets designed to facilitate performance evaluation of logo detection algorithms, which tend to be more diverse, more challenging, and more reflective of real life. Next, we perform an in-depth analysis of the existing logo detection strategies and the strengths and weaknesses of each learning strategy. Subsequently, we summarize the applications of logo detection in various fields, from intelligent transportation and brand monitoring to copyright and trademark compliance. Finally, we analyze the potential challenges and present the future directions for the development of logo detection to complete this survey.
Published: 2022

12. Hash Food Image Retrieval Based on Enhanced Vision Transformer

Author: CAO Pindan, MIN Weiqing, SONG Jiajun, SHENG Guorui, YANG Yancun, WANG Lili, JIANG Shuqiang
Subjects: food image retrieval, food computing, hash retrieval, vision transformer network, deep hash learning, Food processing and manufacture, TP368-456
Abstract: Food image retrieval, a major task in food computing, has garnered extensive attention in recent years. However, it faces two primary challenges. First, food images exhibit fine-grained characteristics, implying that visual differences between different food categories may be subtle and often can only be observable in local regions of the image. Second, food images contain abundant semantic information, such as ingredients and cooking methods, whose extraction and utilization are crucial for enhancing the retrieval performance. To address these issues, this paper proposes an enhanced ViT hash network (EVHNet) based on a pre-trained Vision Transformer (ViT) model. Given the fine-grained nature of food images, a local feature enhancement module enabling the network to learn more representative features was designed in EVHNet based on convolutional structure. To better leverage the semantic information in food images, an aggregated semantic feature module aggregating the information based on class token features was designed in EVHNet. The proposed EVHNet model was evaluated under three popular hash image retrieval frameworks, namely greedy hash (GreedyHash), central similarity quantization (CSQ), and deep polarized network (DPN), and compared with four mainstream network models, AlexNet, ResNet50, ViT-B_32, and ViT-B_16. Experimental results on the Food-101, Vireo Food-172, and UEC Food-256 food datasets demonstrated that the EVHNet model outperformed other models in terms of comprehensive retrieval accuracy.
Published: 2024
Full Text: View/download PDF

13. Machine learning and statistical models to predict all-cause mortality in type 2 diabetes: Results from the UK Biobank study

Author: Zhang, Tingjing, Huang, Mingyu, Chen, Liangkai, Xia, Yang, Min, Weiqing, and Jiang, Shuqiang
Published: 2024
Full Text: View/download PDF

14. Hierarchical Object-to-Zone Graph for Object Navigation

Author: Zhang, Sixian, Song, Xinhang, Bai, Yubing, Li, Weijie, Chu, Yakui, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The goal of object navigation is to reach the expected objects according to visual information in the unseen environments. Previous works usually implement deep models to train an agent to predict actions in real-time. However, in the unseen environment, when the target object is not in egocentric view, the agent may not be able to make wise decisions due to the lack of guidance. In this paper, we propose a hierarchical object-to-zone (HOZ) graph to guide the agent in a coarse-to-fine manner, and an online-learning mechanism is also proposed to update HOZ according to the real-time observation in new environments. In particular, the HOZ graph is composed of scene nodes, zone nodes and object nodes. With the pre-learned HOZ graph, the real-time observation and the target goal, the agent can constantly plan an optimal path from zone to zone. In the estimated path, the next potential zone is regarded as sub-goal, which is also fed into the deep reinforcement learning model for action prediction. Our methods are evaluated on the AI2-Thor simulator. In addition to widely used evaluation metrics SR and SPL, we also propose a new evaluation metric of SAE that focuses on the effective action rate. Experimental results demonstrate the effectiveness and efficiency of our proposed method., Comment: Accepted by ICCV21
Published: 2021

15. Discriminative Semantic Feature Pyramid Network with Guided Anchoring for Logo Detection

Author: Zhang, Baisong, Min, Weiqing, Wang, Jing, Hou, Sujuan, Hou, Qiang, Zheng, Yuanjie, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, logo detection has received more and more attention for its wide applications in the multimedia field, such as intellectual property protection, product brand management, and logo duration monitoring. Unlike general object detection, logo detection is a challenging task, especially for small logo objects and large aspect ratio logo objects in the real-world scenario. In this paper, we propose a novel approach, named Discriminative Semantic Feature Pyramid Network with Guided Anchoring (DSFP-GA), which can address these challenges via aggregating the semantic information and generating different aspect ratio anchor boxes. More specifically, our approach mainly consists of Discriminative Semantic Feature Pyramid (DSFP) and Guided Anchoring (GA). Considering that low-level feature maps that are used to detect small logo objects lack semantic information, we propose the DSFP, which can enrich more discriminative semantic features of low-level feature maps and can achieve better performance on small logo objects. Furthermore, preset anchor boxes are less efficient for detecting large aspect ratio logo objects. We therefore integrate the GA into our method to generate large aspect ratio anchor boxes to mitigate this issue. Extensive experimental results on four benchmarks demonstrate the effectiveness of our proposed DSFP-GA. Moreover, we further conduct visual analysis and ablation studies to illustrate the advantage of our method in detecting small and large aspect logo objects. The code and models can be found at https://github.com/Zhangbaisong/DSFP-GA., Comment: We are very sorry that the result of the whole experiment is wrong because of the wrong derivation of Equation 3, and we would like to withdraw the manuscript to stop the propagation of the mistake
Published: 2021

16. FoodLogoDet-1500: A Dataset for Large-Scale Food Logo Detection via Multi-Scale Feature Decoupling Network

Author: Hou, Qiang, Min, Weiqing, Wang, Jing, Hou, Sujuan, Zheng, Yuanjie, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Food logo detection plays an important role in the multimedia for its wide real-world applications, such as food recommendation of the self-service shop and infringement detection on e-commerce platforms. A large-scale food logo dataset is urgently needed for developing advanced food logo detection algorithms. However, there are no available food logo datasets with food brand information. To support efforts towards food logo detection, we introduce the dataset FoodLogoDet-1500, a new large-scale publicly available food logo dataset, which has 1,500 categories, about 100,000 images and about 150,000 manually annotated food logo objects. We describe the collection and annotation process of FoodLogoDet-1500, analyze its scale and diversity, and compare it with other logo datasets. To the best of our knowledge, FoodLogoDet-1500 is the first largest publicly available high-quality dataset for food logo detection. The challenge of food logo detection lies in the large-scale categories and similarities between food logo categories. For that, we propose a novel food logo detection method Multi-scale Feature Decoupling Network (MFDNet), which decouples classification and regression into two branches and focuses on the classification branch to solve the problem of distinguishing multiple food logo categories. Specifically, we introduce the feature offset module, which utilizes the deformation-learning for optimal classification offset and can effectively obtain the most representative features of classification in detection. In addition, we adopt a balanced feature pyramid in MFDNet, which pays attention to global information, balances the multi-scale feature maps, and enhances feature extraction capability. Comprehensive experiments on FoodLogoDet-1500 and other two benchmark logo datasets demonstrate the effectiveness of the proposed method. The FoodLogoDet-1500 can be found at this https URL., Comment: This paper has been accepted to ACM MM 2021. The FoodLogoDet-1500, see https://github.com/hq03/FoodLogoDet-1500-Dataset
Published: 2021
Full Text: View/download PDF

17. A review on vision-based analysis for automatic dietary assessment

Author: Wang, Wei, Min, Weiqing, Li, Tianhao, Dong, Xiaoxiao, Li, Haisheng, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Background: Maintaining a healthy diet is vital to avoid health-related issues, e.g., undernutrition, obesity and many non-communicable diseases. An indispensable part of the health diet is dietary assessment. Traditional manual recording methods are not only burdensome but time-consuming, and contain substantial biases and errors. Recent advances in Artificial Intelligence (AI), especially computer vision technologies, have made it possible to develop automatic dietary assessment solutions, which are more convenient, less time-consuming and even more accurate to monitor daily food intake. Scope and approach: This review presents Vision-Based Dietary Assessment (VBDA) architectures, including multi-stage architecture and end-to-end one. The multi-stage dietary assessment generally consists of three stages: food image analysis, volume estimation and nutrient derivation. The prosperity of deep learning makes VBDA gradually move to an end-to-end implementation, which applies food images to a single network to directly estimate the nutrition. The recently proposed end-to-end methods are also discussed. We further analyze existing dietary assessment datasets, indicating that one large-scale benchmark is urgently needed, and finally highlight critical challenges and future trends for VBDA. Key findings and conclusions: After thorough exploration, we find that multi-task end-to-end deep learning approaches are one important trend of VBDA. Despite considerable research progress, many challenges remain for VBDA due to the meal complexity. We also provide the latest ideas for future development of VBDA, e.g., fine-grained food analysis and accurate volume estimation. This review aims to encourage researchers to propose more practical solutions for VBDA., Comment: Accepted by Trends in Food Science & Technology
Published: 2021
Full Text: View/download PDF

18. Applications of knowledge graphs for food science and industry

Author: Min, Weiqing, Liu, Chunlin, Xu, Leyi, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The deployment of various networks (e.g., Internet of Things [IoT] and mobile networks), databases (e.g., nutrition tables and food compositional databases), and social media (e.g., Instagram and Twitter) generates huge amounts of food data, which present researchers with an unprecedented opportunity to study various problems and applications in food science and industry via data-driven computational methods. However, these multi-source heterogeneous food data appear as information silos, leading to difficulty in fully exploiting these food data. The knowledge graph provides a unified and standardized conceptual terminology in a structured form, and thus can effectively organize these food data to benefit various applications. In this review, we provide a brief introduction to knowledge graphs and the evolution of food knowledge organization mainly from food ontology to food knowledge graphs. We then summarize seven representative applications of food knowledge graphs, such as new recipe development, diet-disease correlation discovery, and personalized dietary recommendation. We also discuss future directions in this field, such as multimodal food knowledge graph construction and food knowledge graphs for human health., Comment: 45 pages, 6 figures
Published: 2021
Full Text: View/download PDF

19. Automated Segmentation and Classification of Knee Synovitis Based on MRI Using Deep Learning

Author: Wang, Qizheng, Yao, Meiyi, Song, Xinhang, Liu, Yandong, Xing, Xiaoying, Chen, Yongye, Zhao, Fangbo, Liu, Ke, Cheng, Xiaoguang, Jiang, Shuqiang, and Lang, Ning
Published: 2024
Full Text: View/download PDF

20. Large Scale Visual Food Recognition

Author: Min, Weiqing, Wang, Zhiling, Liu, Yuxin, Luo, Mengjiang, Kang, Liping, Wei, Xiaoming, Wei, Xiaolin, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Food recognition plays an important role in food choice and intake, which is essential to the health and well-being of humans. It is thus of importance to the computer vision community, and can further support many food-oriented vision and multimodal tasks. Unfortunately, we have witnessed remarkable advancements in generic visual recognition for released large-scale datasets, yet largely lags in the food domain. In this paper, we introduce Food2K, which is the largest food recognition dataset with 2,000 categories and over 1 million images.Compared with existing food recognition datasets, Food2K bypasses them in both categories and images by one order of magnitude, and thus establishes a new challenging benchmark to develop advanced models for food visual representation learning. Furthermore, we propose a deep progressive region enhancement network for food recognition, which mainly consists of two components, namely progressive local feature learning and region feature enhancement. The former adopts improved progressive training to learn diverse and complementary local features, while the latter utilizes self-attention to incorporate richer context with multiple scales into local features for further local feature enhancement. Extensive experiments on Food2K demonstrate the effectiveness of our proposed method. More importantly, we have verified better generalization ability of Food2K in various tasks, including food recognition, food image retrieval, cross-modal recipe retrieval, food detection and segmentation. Food2K can be further explored to benefit more food-relevant tasks including emerging and more complex ones (e.g., nutritional understanding of food), and the trained models on Food2K can be expected as backbones to improve the performance of more food-relevant tasks. We also hope Food2K can serve as a large scale fine-grained visual recognition benchmark., Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence
Published: 2021

21. Rethinking the Optimization of Average Precision: Only Penalizing Negative Instances before Positive Ones is Enough

Author: Li, Zhuo, Min, Weiqing, Song, Jiajun, Zhu, Yaohui, Kang, Liping, Wei, Xiaoming, Wei, Xiaolin, and Jiang, Shuqiang
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence
Abstract: Optimizing the approximation of Average Precision (AP) has been widely studied for image retrieval. Limited by the definition of AP, such methods consider both negative and positive instances ranking before each positive instance. However, we claim that only penalizing negative instances before positive ones is enough, because the loss only comes from these negative instances. To this end, we propose a novel loss, namely Penalizing Negative instances before Positive ones (PNP), which can directly minimize the number of negative instances before each positive one. In addition, AP-based methods adopt a fixed and sub-optimal gradient assignment strategy. Therefore, we systematically investigate different gradient assignment solutions via constructing derivative functions of the loss, resulting in PNP-I with increasing derivative functions and PNP-D with decreasing ones. PNP-I focuses more on the hard positive instances by assigning larger gradients to them and tries to make all relevant instances closer. In contrast, PNP-D pays less attention to such instances and slowly corrects them. For most real-world data, one class usually contains several local clusters. PNP-I blindly gathers these clusters while PNP-D keeps them as they were. Therefore, PNP-D is more superior. Experiments on three standard retrieval datasets show consistent results with the above analysis. Extensive evaluations demonstrate that PNP-D achieves the state-of-the-art performance. Code is available at https://github.com/interestingzhuo/PNPloss
Published: 2021

22. Dataset Bias in Few-shot Image Recognition

Author: Jiang, Shuqiang, Zhu, Yaohui, Liu, Chenlong, Song, Xinhang, Li, Xiangyang, and Min, Weiqing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The goal of few-shot image recognition (FSIR) is to identify novel categories with a small number of annotated samples by exploiting transferable knowledge from training data (base categories). Most current studies assume that the transferable knowledge can be well used to identify novel categories. However, such transferable capability may be impacted by the dataset bias, and this problem has rarely been investigated before. Besides, most of few-shot learning methods are biased to different datasets, which is also an important issue that needs to be investigated deeply. In this paper, we first investigate the impact of transferable capabilities learned from base categories. Specifically, we use the relevance to measure relationships between base categories and novel categories. Distributions of base categories are depicted via the instance density and category diversity. The FSIR model learns better transferable knowledge from relevant training data. In the relevant data, dense instances or diverse categories can further enrich the learned knowledge. Experimental results on different sub-datasets of ImagNet demonstrate category relevance, instance density and category diversity can depict transferable bias from base categories. Second, we investigate performance differences on different datasets from dataset structures and different few-shot learning methods. Specifically, we introduce image complexity, intra-concept visual consistency, and inter-concept visual similarity to quantify characteristics of dataset structures. We use these quantitative characteristics and four few-shot learning methods to analyze performance differences on five different datasets. Based on the experimental analysis, some insightful observations are obtained from the perspective of both dataset structures and few-shot learning methods. We hope these observations are useful to guide future FSIR research.
Published: 2020

23. ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network

Author: Min, Weiqing, Liu, Linhu, Wang, Zhiling, Luo, Zhengdong, Wei, Xiaoming, Wei, Xiaolin, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: Food recognition has received more and more attention in the multimedia community for its various real-world applications, such as diet management and self-service restaurants. A large-scale ontology of food images is urgently needed for developing advanced large-scale food recognition algorithms, as well as for providing the benchmark dataset for such algorithms. To encourage further progress in food recognition, we introduce the dataset ISIA Food- 500 with 500 categories from the list in the Wikipedia and 399,726 images, a more comprehensive food dataset that surpasses existing popular benchmark datasets by category coverage and data volume. Furthermore, we propose a stacked global-local attention network, which consists of two sub-networks for food recognition. One subnetwork first utilizes hybrid spatial-channel attention to extract more discriminative features, and then aggregates these multi-scale discriminative features from multiple layers into global-level representation (e.g., texture and shape information about food). The other one generates attentional regions (e.g., ingredient relevant regions) from different regions via cascaded spatial transformers, and further aggregates these multi-scale regional features from different layers into local-level representation. These two types of features are finally fused as comprehensive representation for food recognition. Extensive experiments on ISIA Food-500 and other two popular benchmark datasets demonstrate the effectiveness of our proposed method, and thus can be considered as one strong baseline. The dataset, code and models can be found at http://123.57.42.89/FoodComputing-Dataset/ISIA-Food500.html., Comment: Accepted by ACM Multimedia 2020
Published: 2020

24. LogoDet-3K: A Large-Scale Image Dataset for Logo Detection

Author: Wang, Jing, Min, Weiqing, Hou, Sujuan, Ma, Shengnan, Zheng, Yuanjie, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: Logo detection has been gaining considerable attention because of its wide range of applications in the multimedia field, such as copyright infringement detection, brand visibility monitoring, and product brand management on social media. In this paper, we introduce LogoDet-3K, the largest logo detection dataset with full annotation, which has 3,000 logo categories, about 200,000 manually annotated logo objects and 158,652 images. LogoDet-3K creates a more challenging benchmark for logo detection, for its higher comprehensive coverage and wider variety in both logo categories and annotated objects compared with existing datasets. We describe the collection and annotation process of our dataset, analyze its scale and diversity in comparison to other datasets for logo detection. We further propose a strong baseline method Logo-Yolo, which incorporates Focal loss and CIoU loss into the state-of-the-art YOLOv3 framework for large-scale logo detection. Logo-Yolo can solve the problems of multi-scale objects, logo sample imbalance and inconsistent bounding-box regression. It obtains about 4% improvement on the average performance compared with YOLOv3, and greater improvements compared with reported several deep detection models on LogoDet-3K. The evaluations on other three existing datasets further verify the effectiveness of our method, and demonstrate better generalization ability of LogoDet-3K on logo detection and retrieval tasks. The LogoDet-3K dataset is used to promote large-scale logo-related research and it can be found at https://github.com/Wangjing1551/LogoDet-3K-Dataset.
Published: 2020

25. Vision-based food nutrition estimation via RGB-D fusion network

Author: Shao, Wenjing, Min, Weiqing, Hou, Sujuan, Luo, Mengjiang, Li, Tianhao, Zheng, Yuanjie, and Jiang, Shuqiang
Published: 2023
Full Text: View/download PDF

26. Logo-2K+: A Large-Scale Logo Dataset for Scalable Logo Classification

Author: Wang, Jing, Min, Weiqing, Hou, Sujuan, Ma, Shengnan, Zheng, Yuanjie, Wang, Haishuai, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Logo classification has gained increasing attention for its various applications, such as copyright infringement detection, product recommendation and contextual advertising. Compared with other types of object images, the real-world logo images have larger variety in logo appearance and more complexity in their background. Therefore, recognizing the logo from images is challenging. To support efforts towards scalable logo classification task, we have curated a dataset, Logo-2K+, a new large-scale publicly available real-world logo dataset with 2,341 categories and 167,140 images. Compared with existing popular logo datasets, such as FlickrLogos-32 and LOGO-Net, Logo-2K+ has more comprehensive coverage of logo categories and larger quantity of logo images. Moreover, we propose a Discriminative Region Navigation and Augmentation Network (DRNA-Net), which is capable of discovering more informative logo regions and augmenting these image regions for logo classification. DRNA-Net consists of four sub-networks: the navigator sub-network first selected informative logo-relevant regions guided by the teacher sub-network, which can evaluate its confidence belonging to the ground-truth logo class. The data augmentation sub-network then augments the selected regions via both region cropping and region dropping. Finally, the scrutinizer sub-network fuses features from augmented regions and the whole image for logo classification. Comprehensive experiments on Logo-2K+ and other three existing benchmark datasets demonstrate the effectiveness of proposed method. Logo-2K+ and the proposed strong baseline DRNA-Net are expected to further the development of scalable logo image recognition, and the Logo-2K+ dataset can be found at https://github.com/msn199959/Logo-2k-plus-Dataset., Comment: Accepted by AAAI2020
Published: 2019

27. Scene Recognition with Prototype-agnostic Scene Layout

Author: Chen, Gongwei, Song, Xinhang, Zeng, Haitao, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Abstract--- Exploiting the spatial structure in scene images is a key research direction for scene recognition. Due to the large intra-class structural diversity, building and modeling flexible structural layout to adapt various image characteristics is a challenge. Existing structural modeling methods in scene recognition either focus on predefined grids or rely on learned prototypes, which all have limited representative ability. In this paper, we propose Prototype-agnostic Scene Layout (PaSL) construction method to build the spatial structure for each image without conforming to any prototype. Our PaSL can flexibly capture the diverse spatial characteristic of scene images and have considerable generalization capability. Given a PaSL, we build Layout Graph Network (LGN) where regions in PaSL are defined as nodes and two kinds of independent relations between regions are encoded as edges. The LGN aims to incorporate two topological structures (formed in spatial and semantic similarity dimensions) into image representations through graph convolution. Extensive experiments show that our approach achieves state-of-the-art results on widely recognized MIT67 and SUN397 datasets without multi-model or multi-scale fusion. Moreover, we also conduct the experiments on one of the largest scale datasets, Places365. The results demonstrate the proposed method can be well generalized and obtains competitive performance.
Published: 2019
Full Text: View/download PDF

28. Multifaceted Analysis of Fine-Tuning in Deep Model for Visual Recognition

Author: Li, Xiangyang, Herranz, Luis, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In recent years, convolutional neural networks (CNNs) have achieved impressive performance for various visual recognition scenarios. CNNs trained on large labeled datasets can not only obtain significant performance on most challenging benchmarks but also provide powerful representations, which can be used to a wide range of other tasks. However, the requirement of massive amounts of data to train deep neural networks is a major drawback of these models, as the data available is usually limited or imbalanced. Fine-tuning (FT) is an effective way to transfer knowledge learned in a source dataset to a target task. In this paper, we introduce and systematically investigate several factors that influence the performance of fine-tuning for visual recognition. These factors include parameters for the retraining procedure (e.g., the initial learning rate of fine-tuning), the distribution of the source and target data (e.g., the number of categories in the source dataset, the distance between the source and target datasets) and so on. We quantitatively and qualitatively analyze these factors, evaluate their influence, and present many empirical observations. The results reveal insights into what fine-tuning changes CNN parameters and provide useful and evidence-backed intuitions about how to implement fine-tuning for computer vision tasks., Comment: Accepted by ACM Transactions on Data Science
Published: 2019
Full Text: View/download PDF

29. Food Recommendation: Framework, Existing Solutions and Challenges

Author: Min, Weiqing, Jiang, Shuqiang, and Jain, Ramesh
Subjects: Computer Science - Computers and Society, Computer Science - Information Retrieval, Computer Science - Multimedia
Abstract: A growing proportion of the global population is becoming overweight or obese, leading to various diseases (e.g., diabetes, ischemic heart disease and even cancer) due to unhealthy eating patterns, such as increased intake of food with high energy and high fat. Food recommendation is of paramount importance to alleviate this problem. Unfortunately, modern multimedia research has enhanced the performance and experience of multimedia recommendation in many fields such as movies and POI, yet largely lags in the food domain. This article proposes a unified framework for food recommendation, and identifies main issues affecting food recommendation including building the personal model, analyzing unique food characteristics, incorporating various context and domain knowledge. We then review existing solutions for these issues, and finally elaborate research challenges and future directions in this field. To our knowledge, this is the first survey that targets the study of food recommendation in the multimedia field and offers a collection of research studies and technologies to benefit researchers in this field., Comment: Accepted by IEEE Transactions on Multimedia
Published: 2019

30. Vision-based fruit recognition via multi-scale attention CNN

Author: Min, Weiqing, Wang, Zhiling, Yang, Jiahao, Liu, Chunlin, and Jiang, Shuqiang
Published: 2023
Full Text: View/download PDF

31. Generative Meta-Adversarial Network for Unseen Object Navigation

Author: Zhang, Sixian, Li, Weijie, Song, Xinhang, Bai, Yubing, Jiang, Shuqiang, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Avidan, Shai, editor, Brostow, Gabriel, editor, Cissé, Moustapha, editor, Farinella, Giovanni Maria, editor, and Hassner, Tal, editor
Published: 2022
Full Text: View/download PDF

32. Electrochemical-induced cross-coupling for organic and bioorganic synthesis

Author: Weng, Yue, Wan, Chenggang, Sun, Rong, Jiang, Shuqiang, Yang, Guichun, and Lei, Aiwen
Published: 2023
Full Text: View/download PDF

33. Hierarchy-Dependent Cross-Platform Multi-View Feature Learning for Venue Category Prediction

Author: Jiang, Shuqiang, Min, Weiqing, and Mei, Shuhuan
Subjects: Computer Science - Multimedia
Abstract: In this work, we focus on visual venue category prediction, which can facilitate various applications for location-based service and personalization. Considering that the complementarity of different media platforms, it is reasonable to leverage venue-relevant media data from different platforms to boost the prediction performance. Intuitively, recognizing one venue category involves multiple semantic cues, especially objects and scenes, and thus they should contribute together to venue category prediction. In addition, these venues can be organized in a natural hierarchical structure, which provides prior knowledge to guide venue category estimation. Taking these aspects into account, we propose a Hierarchy-dependent Cross-platform Multi-view Feature Learning (HCM-FL) framework for venue category prediction from videos by leveraging images from other platforms. HCM-FL includes two major components, namely Cross-Platform Transfer Deep Learning (CPTDL) and Multi-View Feature Learning with the Hierarchical Venue Structure (MVFL-HVS). CPTDL is capable of reinforcing the learned deep network from videos using images from other platforms. Specifically, CPTDL first trained a deep network using videos. These images from other platforms are filtered by the learnt network and these selected images are then fed into this learnt network to enhance it. Two kinds of pre-trained networks on the ImageNet and Places dataset are employed. MVFL-HVS is then developed to enable multi-view feature fusion. It is capable of embedding the hierarchical structure ontology to support more discriminative joint feature learning. We conduct the experiment on videos from Vine and images from Foursqure. These experimental results demonstrate the advantage of our proposed framework., Comment: Accepted by IEEE Transactions on Multimedia
Published: 2018

34. Learning Effective RGB-D Representations for Scene Recognition

Author: Song, Xinhang, Jiang, Shuqiang, Herranz, Luis, and Chen, Chengpeng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep convolutional networks (CNN) can achieve impressive results on RGB scene recognition thanks to large datasets such as Places. In contrast, RGB-D scene recognition is still underdeveloped in comparison, due to two limitations of RGB-D data we address in this paper. The first limitation is the lack of depth data for training deep learning models. Rather than fine tuning or transferring RGB-specific features, we address this limitation by proposing an architecture and a two-step training approach that directly learns effective depth-specific features using weak supervision via patches. The resulting RGB-D model also benefits from more complementary multimodal features. Another limitation is the short range of depth sensors (typically 0.5m to 5.5m), resulting in depth images not capturing distant objects in the scenes that RGB images can. We show that this limitation can be addressed by using RGB-D videos, where more comprehensive depth information is accumulated as the camera travels across the scene. Focusing on this scenario, we introduce the ISIA RGB-D video dataset to evaluate RGB-D scene recognition with videos. Our video recognition architecture combines convolutional and recurrent neural networks (RNNs) that are trained in three steps with increasingly complex data to learn effective features (i.e. patches, frames and sequences). Our approach obtains state-of-the-art performances on RGB-D image (NYUD2 and SUN RGB-D) and video (ISIA RGB-D) scene recognition., Comment: Accepted at IEEE Transactions on Image Processing
Published: 2018
Full Text: View/download PDF

35. A Survey on Food Computing

Author: Min, Weiqing, Jiang, Shuqiang, Liu, Linhu, Rui, Yong, and Jain, Ramesh
Subjects: Computer Science - Computers and Society, Computer Science - Multimedia
Abstract: Food is very essential for human life and it is fundamental to the human experience. Food-related study may support multifarious applications and services, such as guiding the human behavior, improving the human health and understanding the culinary culture. With the rapid development of social networks, mobile networks, and Internet of Things (IoT), people commonly upload, share, and record food images, recipes, cooking videos, and food diaries, leading to large-scale food data. Large-scale food data offers rich knowledge about food and can help tackle many central issues of human society. Therefore, it is time to group several disparate issues related to food computing. Food computing acquires and analyzes heterogenous food data from disparate sources for perception, recognition, retrieval, recommendation, and monitoring of food. In food computing, computational approaches are applied to address food related issues in medicine, biology, gastronomy and agronomy. Both large-scale food data and recent breakthroughs in computer science are transforming the way we analyze food data. Therefore, vast amounts of work has been conducted in the food area, targeting different food-oriented tasks and applications. However, there are very few systematic reviews, which shape this area well and provide a comprehensive and in-depth summary of current efforts or detail open problems in this area. In this paper, we formalize food computing and present such a comprehensive overview of various emerging concepts, methods, and tasks. We summarize key challenges and future directions ahead for food computing. This is the first comprehensive survey that targets the study of computing technology for the food area and also offers a collection of research studies and technologies to benefit researchers and practitioners working in different food-related fields., Comment: Accepted by ACM Computing Surveys
Published: 2018

36. Food recognition and recipe analysis: integrating visual content, context and external knowledge

Author: Herranz, Luis, Min, Weiqing, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: The central role of food in our individual and social life, combined with recent technological advances, has motivated a growing interest in applications that help to better monitor dietary habits as well as the exploration and retrieval of food-related information. We review how visual content, context and external knowledge can be integrated effectively into food-oriented applications, with special focus on recipe analysis and retrieval, food recommendation, and the restaurant context as emerging directions., Comment: Survey about contextual food recognition and multimodal recipe analysis
Published: 2018

37. Scene recognition with CNNs: objects, scales and dataset bias

Author: Herranz, Luis, Jiang, Shuqiang, and Li, Xiangyang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Since scenes are composed in part of objects, accurate recognition of scenes requires knowledge about both scenes and objects. In this paper we address two related problems: 1) scale induced dataset bias in multi-scale convolutional neural network (CNN) architectures, and 2) how to combine effectively scene-centric and object-centric knowledge (i.e. Places and ImageNet) in CNNs. An earlier attempt, Hybrid-CNN, showed that incorporating ImageNet did not help much. Here we propose an alternative method taking the scale into account, resulting in significant recognition gains. By analyzing the response of ImageNet-CNNs and Places-CNNs at different scales we find that both operate in different scale ranges, so using the same network for all the scales induces dataset bias resulting in limited performance. Thus, adapting the feature extractor to each particular scale (i.e. scale-specific CNNs) is crucial to improve recognition, since the objects in the scenes have their specific range of scales. Experimental results show that the recognition accuracy highly depends on the scale, and that simple yet carefully chosen multi-scale combinations of ImageNet-CNNs and Places-CNNs, can push the state-of-the-art recognition accuracy in SUN397 up to 66.26% (and even 70.17% with deeper architectures, comparable to human performance)., Comment: CVPR 2016
Published: 2018
Full Text: View/download PDF

38. Depth CNNs for RGB-D scene recognition: learning from scratch better than transferring from RGB-CNNs

Author: Song, Xinhang, Herranz, Luis, and Jiang, Shuqiang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: Scene recognition with RGB images has been extensively studied and has reached very remarkable recognition levels, thanks to convolutional neural networks (CNN) and large scene datasets. In contrast, current RGB-D scene data is much more limited, so often leverages RGB large datasets, by transferring pretrained RGB CNN models and fine-tuning with the target RGB-D dataset. However, we show that this approach has the limitation of hardly reaching bottom layers, which is key to learn modality-specific features. In contrast, we focus on the bottom layers, and propose an alternative strategy to learn depth features combining local weakly supervised training from patches followed by global fine tuning with images. This strategy is capable of learning very discriminative depth-specific features with limited depth images, without resorting to Places-CNN. In addition we propose a modified CNN architecture to further match the complexity of the model and the amount of data available. For RGB-D scene recognition, depth and RGB features are combined by projecting them in a common space and further leaning a multilayer classifier, which is jointly optimized in an end-to-end network. Our framework achieves state-of-the-art accuracy on NYU2 and SUN RGB-D in both depth only and combined RGB-D data., Comment: AAAI Conference on Artificial Intelligence 2017
Published: 2018

39. Applications of knowledge graphs for food science and industry

Author: Min, Weiqing, Liu, Chunlin, Xu, Leyi, and Jiang, Shuqiang
Published: 2022
Full Text: View/download PDF

40. Lightweight Food Recognition via Aggregation Block and Feature Encoding.

Author: Yang, Yancun, Min, Weiqing, Song, Jingru, Sheng, Guorui, Wang, Lili, and Jiang, Shuqiang
Subjects: IMAGE recognition (Computer vision), DATA mining, SOURCE code, ENCODING
Abstract: Food image recognition has recently been given considerable attention in the multimedia field in light of its possible implications on health. The characteristics of the dispersed distribution of ingredients in food images put forward higher requirements on the long-range information extraction ability of neural networks, leading to more complex and deeper models. Nevertheless, the lightweight version of food image recognition is essential for improved implementation on end devices and sustained server-side expansion. To address this issue, we present Aggregation Feature Net (AFNet), a lightweight network that is capable of effectively capturing both global and local features from food images. In AFNet, we develop a novel convolution based on a residual model by encoding global features through row-wise and column-wise information integration. Merging aggregation block with classic local convolution yields a framework that works as the backbone of the network. Based on the efficient use of parameters by the aggregation block, we constructed a lightweight food image recognition network with fewer layers and a smaller scale, assisted by a new type of activation function. Experimental results on four popular food recognition datasets demonstrate that our approach achieves state-of-the-art performance with higher accuracy and fewer FLOPs and parameters. For example, in comparison to the current state-of-the-art model of MobileViTv2, AFNet achieved 88.4% accuracy of the top-1 level on the ETHZ Food-101 dataset, with similar parameters and FLOPs but 1.4% more accuracy. The source code will be provided in supplementary materials. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. A Lightweight Hybrid Model with Location-Preserving ViT for Efficient Food Recognition

Author: Sheng, Guorui, primary, Min, Weiqing, additional, Zhu, Xiangyi, additional, Xu, Liang, additional, Sun, Qingshuo, additional, Yang, Yancun, additional, Wang, Lili, additional, and Jiang, Shuqiang, additional
Published: 2024
Full Text: View/download PDF

42. Convolution-Enhanced Bi-Branch Adaptive Transformer With Cross-Task Interaction for Food Category and Ingredient Recognition

Author: Liu, Yuxin, primary, Min, Weiqing, additional, Jiang, Shuqiang, additional, and Rui, Yong, additional
Published: 2024
Full Text: View/download PDF

43. Synthesizing Knowledge-Enhanced Features for Real-World Zero-Shot Food Detection

Author: Zhou, Pengfei, primary, Min, Weiqing, additional, Song, Jiajun, additional, Zhang, Yang, additional, and Jiang, Shuqiang, additional
Published: 2024
Full Text: View/download PDF

44. Electrocatalytic O-S Bonding Reaction Targeting Biological Macromolecules

Author: Jiang, ShuQiang, primary, Xiao, Longyu, additional, Pan, Li, additional, Huang, Qiaoyu, additional, Huo, Fujin, additional, Gao, Meng, additional, Lu, Cuifen, additional, Wu, Pan, additional, and Weng, Yue, additional
Published: 2024
Full Text: View/download PDF

45. Towards Egocentric Compositional Action Anticipation with Adaptive Semantic Debiasing

Author: Zhang, Tianyu, primary, Min, Weiqing, additional, Liu, Tao, additional, Jiang, Shuqiang, additional, and Rui, Yong, additional
Published: 2023
Full Text: View/download PDF

46. Indoor RGB-D Object Detection with the Guidance of Hand-Held Objects

Author: Qiao, Leixian, Zhu, Yaohui, Li, Runze, Min, Weiqing, Jiang, Shuqiang, Barbosa, Simone Diniz Junqueira, Series Editor, Chen, Phoebe, Series Editor, Filipe, Joaquim, Series Editor, Kotenko, Igor, Series Editor, Sivalingam, Krishna M., Series Editor, Washio, Takashi, Series Editor, Yuan, Junsong, Series Editor, Zhou, Lizhu, Series Editor, Huet, Benoit, editor, Nie, Liqiang, editor, and Hong, Richang, editor
Published: 2018
Full Text: View/download PDF

47. Focal Loss for Region Proposal Network

Author: Chen, Chengpeng, Song, Xinhang, Jiang, Shuqiang, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Lai, Jian-Huang, editor, Liu, Cheng-Lin, editor, Chen, Xilin, editor, Zhou, Jie, editor, Tan, Tieniu, editor, Zheng, Nanning, editor, and Zha, Hongbin, editor
Published: 2018
Full Text: View/download PDF

48. Hybrid incremental learning of new data and new classes for hand-held object recognition

Author: Chen, Chengpeng, Min, Weiqing, Li, Xue, and Jiang, Shuqiang
Published: 2019
Full Text: View/download PDF

49. Toward Egocentric Compositional Action Anticipation with Adaptive Semantic Debiasing.

Author: Zhang, Tianyu, Min, Weiqing, Liu, Tao, Jiang, Shuqiang, and Rui, Yong
Subjects: ARTIFICIAL neural networks, ARTIFICIAL intelligence, EXPECTATION (Psychology), COUNTERFACTUALS (Logic), OPTICAL tweezers
Abstract: Predicting the unknown from the first-person perspective is expected as a necessary step toward machine intelligence, which is essential for practical applications including autonomous driving and robotics. As a human-level task, egocentric action anticipation aims at predicting an unknown action seconds before it is performed from the first-person viewpoint. Egocentric actions are usually provided as verb-noun pairs; however, predicting the unknown action may be trapped in insufficient training data for all possible combinations. Therefore, it is crucial for intelligent systems to use limited known verb-noun pairs to predict new combinations of actions that have never appeared, which is known as compositional generalization. In this article, we are the first to explore the egocentric compositional action anticipation problem, which is more in line with real-world settings but neglected by existing studies. Whereas prediction results are prone to suffer from semantic bias considering the distinct difference between training and test distributions, we further introduce a general and flexible adaptive semantic debiasing framework that is compatible with different deep neural networks. To capture and mitigate semantic bias, we can imagine one counterfactual situation where no visual representations have been observed and only semantic patterns of observation are used to predict the next action. Instead of the traditional counterfactual analysis scheme that reduces semantic bias in a mindless way, we devise a novel counterfactual analysis scheme to adaptively amplify or penalize the effect of semantic experience by considering the discrepancy both among categories and among examples. We also demonstrate that the traditional counterfactual analysis scheme is a special case of the devised adaptive counterfactual analysis scheme. We conduct experiments on three large-scale egocentric video datasets. Experimental results verify the superiority and effectiveness of our proposed solution. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. Automated Segmentation and Classification of Knee Synovitis Based on MRI Using Deep Learning

Author: Wang, Qizheng, primary, Yao, Meiyi, additional, Song, Xinhang, additional, Liu, Yandong, additional, Xing, Xiaoying, additional, Chen, Yongye, additional, Zhao, Fangbo, additional, Liu, Ke, additional, Cheng, Xiaoguang, additional, Jiang, Shuqiang, additional, and Lang, Ning, additional
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

699 results on '"Jiang, Shuqiang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources