Author: "Fu, Siming" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Fu, Siming"' showing total 23 results

Start Over Author "Fu, Siming"

23 results on '"Fu, Siming"'

1. Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation

Author: Huang, Qihan, Fu, Siming, Liu, Jinlong, Jiang, Hao, Yu, Yipeng, and Song, Jie
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Personalized text-to-image generation methods can generate customized images based on the reference images, which have garnered wide research interest. Recent methods propose a finetuning-free approach with a decoupled cross-attention mechanism to generate personalized images requiring no test-time finetuning. However, when multiple reference images are provided, the current decoupled cross-attention mechanism encounters the object confusion problem and fails to map each reference image to its corresponding object, thereby seriously limiting its scope of application. To address the object confusion problem, in this work we investigate the relevance of different positions of the latent image features to the target object in diffusion model, and accordingly propose a weighted-merge method to merge multiple reference image features into the corresponding objects. Next, we integrate this weighted-merge method into existing pre-trained models and continue to train the model on a multi-object dataset constructed from the open-sourced SA-1B dataset. To mitigate object confusion and reduce training costs, we propose an object quality score to estimate the image quality for the selection of high-quality training samples. Furthermore, our weighted-merge training framework can be employed on single-object generation when a single object has multiple reference images. The experiments verify that our method achieves superior performance to the state-of-the-arts on the Concept101 dataset and DreamBooth dataset of multi-object personalized image generation, and remarkably improves the performance on single-object personalized image generation. Our code is available at https://github.com/hqhQAQ/MIP-Adapter.
Published: 2024

2. LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

Author: Shu, Fangxun, Liao, Yue, Zhuo, Le, Xu, Chenning, Zhang, Lei, Zhang, Guanghao, Shi, Haonan, Chen, Long, Zhong, Tao, He, Wanggui, Fu, Siming, Li, Haoyuan, Li, Bolin, Yu, Zhelun, Liu, Si, Li, Hongsheng, and Jiang, Hao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce LLaVA-MoD, a novel framework designed to enable the efficient training of small-scale Multimodal Language Models (s-MLLM) by distilling knowledge from large-scale MLLM (l-MLLM). Our approach tackles two fundamental challenges in MLLM distillation. First, we optimize the network structure of s-MLLM by integrating a sparse Mixture of Experts (MoE) architecture into the language model, striking a balance between computational efficiency and model expressiveness. Second, we propose a progressive knowledge transfer strategy to ensure comprehensive knowledge migration. This strategy begins with mimic distillation, where we minimize the Kullback-Leibler (KL) divergence between output distributions to enable the student model to emulate the teacher network's understanding. Following this, we introduce preference distillation via Direct Preference Optimization (DPO), where the key lies in treating l-MLLM as the reference model. During this phase, the s-MLLM's ability to discriminate between superior and inferior examples is significantly enhanced beyond l-MLLM, leading to a better student that surpasses its teacher, particularly in hallucination benchmarks. Extensive experiments demonstrate that LLaVA-MoD outperforms existing models across various multimodal benchmarks while maintaining a minimal number of activated parameters and low computational costs. Remarkably, LLaVA-MoD, with only 2B activated parameters, surpasses Qwen-VL-Chat-7B by an average of 8.8% across benchmarks, using merely 0.3% of the training data and 23% trainable parameters. These results underscore LLaVA-MoD's ability to effectively distill comprehensive knowledge from its teacher model, paving the way for the development of more efficient MLLMs. The code will be available on: https://github.com/shufangxun/LLaVA-MoD.
Published: 2024

3. MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

Author: He, Wanggui, Fu, Siming, Liu, Mushui, Wang, Xierui, Xiao, Wenyi, Shu, Fangxun, Wang, Yi, Zhang, Lei, Yu, Zhelun, Li, Haoyuan, Huang, Ziwei, Gan, LeiLei, and Jiang, Hao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Auto-regressive models have made significant progress in the realm of language generation, yet they do not perform on par with diffusion models in the domain of image synthesis. In this work, we introduce MARS, a novel framework for T2I generation that incorporates a specially designed Semantic Vision-Language Integration Expert (SemVIE). This innovative component integrates pre-trained LLMs by independently processing linguistic and visual information, freezing the textual component while fine-tuning the visual component. This methodology preserves the NLP capabilities of LLMs while imbuing them with exceptional visual understanding. Building upon the powerful base of the pre-trained Qwen-7B, MARS stands out with its bilingual generative capabilities corresponding to both English and Chinese language prompts and the capacity for joint image and text generation. The flexibility of this framework lends itself to migration towards any-to-any task adaptability. Furthermore, MARS employs a multi-stage training strategy that first establishes robust image-text alignment through complementary bidirectional tasks and subsequently concentrates on refining the T2I generation process, significantly augmenting text-image synchrony and the granularity of image details. Notably, MARS requires only 9% of the GPU days needed by SD1.5, yet it achieves remarkable results across a variety of benchmarks, illustrating the training efficiency and the potential for swift deployment in various applications., Comment: 14 pages, 9 figures
Published: 2024

4. MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance

Author: Wang, X., Fu, Siming, Huang, Qihan, He, Wanggui, and Jiang, Hao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent advancements in text-to-image generation models have dramatically enhanced the generation of photorealistic images from textual prompts, leading to an increased interest in personalized text-to-image applications, particularly in multi-subject scenarios. However, these advances are hindered by two main challenges: firstly, the need to accurately maintain the details of each referenced subject in accordance with the textual descriptions; and secondly, the difficulty in achieving a cohesive representation of multiple subjects in a single image without introducing inconsistencies. To address these concerns, our research introduces the MS-Diffusion framework for layout-guided zero-shot image personalization with multi-subjects. This innovative approach integrates grounding tokens with the feature resampler to maintain detail fidelity among subjects. With the layout guidance, MS-Diffusion further improves the cross-attention to adapt to the multi-subject inputs, ensuring that each subject condition acts on specific areas. The proposed multi-subject cross-attention orchestrates harmonious inter-subject compositions while preserving the control of texts. Comprehensive quantitative and qualitative experiments affirm that this method surpasses existing models in both image and text fidelity, promoting the development of personalized text-to-image generation.
Published: 2024

5. TrainerAgent: Customizable and Efficient Model Training through LLM-Powered Multi-Agent System

Author: Li, Haoyuan, Jiang, Hao, Zhang, Tianke, Yu, Zhelun, Yin, Aoxiong, Cheng, Hao, Fu, Siming, Zhang, Yuhao, and He, Wanggui
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Training AI models has always been challenging, especially when there is a need for custom models to provide personalized services. Algorithm engineers often face a lengthy process to iteratively develop models tailored to specific business requirements, making it even more difficult for non-experts. The quest for high-quality and efficient model development, along with the emergence of Large Language Model (LLM) Agents, has become a key focus in the industry. Leveraging the powerful analytical, planning, and decision-making capabilities of LLM, we propose a TrainerAgent system comprising a multi-agent framework including Task, Data, Model and Server agents. These agents analyze user-defined tasks, input data, and requirements (e.g., accuracy, speed), optimizing them comprehensively from both data and model perspectives to obtain satisfactory models, and finally deploy these models as online service. Experimental evaluations on classical discriminative and generative tasks in computer vision and natural language processing domains demonstrate that our system consistently produces models that meet the desired criteria. Furthermore, the system exhibits the ability to critically identify and reject unattainable tasks, such as fantastical scenarios or unethical requests, ensuring robustness and safety. This research presents a significant advancement in achieving desired models with increased efficiency and quality as compared to traditional model development, facilitated by the integration of LLM-powered analysis, decision-making, and execution capabilities, as well as the collaboration among four agents. We anticipate that our work will contribute to the advancement of research on TrainerAgent in both academic and industry communities, potentially establishing it as a new paradigm for model development in the field of AI.
Published: 2023

6. Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

Author: Fu, Siming, He, Xiaoxuan, Ding, Xinpeng, Cao, Yuchen, and Wang, Hualiang
Subjects: Computer Science - Computer Vision and Pattern Recognition, 14J60 (Primary) 14F05, 14J26 (Secondary), I.4.10
Abstract: Recently, large-scale pre-trained vision-language models have presented benefits for alleviating class imbalance in long-tailed recognition. However, the long-tailed data distribution can corrupt the representation space, where the distance between head and tail categories is much larger than the distance between two tail categories. This uneven feature space distribution causes the model to exhibit unclear and inseparable decision boundaries on the uniformly distributed test set, which lowers its performance. To address these challenges, we propose the uniformly category prototype-guided vision-language framework to effectively mitigate feature space bias caused by data imbalance. Especially, we generate a set of category prototypes uniformly distributed on a hypersphere. Category prototype-guided mechanism for image-text matching makes the features of different classes converge to these distinct and uniformly distributed category prototypes, which maintain a uniform distribution in the feature space, and improve class boundaries. Additionally, our proposed irrelevant text filtering and attribute enhancement module allows the model to ignore irrelevant noisy text and focus more on key attribute information, thereby enhancing the robustness of our framework. In the image recognition fine-tuning stage, to address the positive bias problem of the learnable classifier, we design the class feature prototype-guided classifier, which compensates for the performance of tail classes while maintaining the performance of head classes. Our method outperforms previous vision-language methods for long-tailed learning work by a large margin and achieves state-of-the-art performance., Comment: 11pages, 5figures
Published: 2023

7. Towards Calibrated Hyper-Sphere Representation via Distribution Overlap Coefficient for Long-tailed Learning

Author: Wang, Hualiang, Fu, Siming, He, Xiaoxuan, Fang, Hangxiang, Liu, Zuozhu, and Hu, Haoji
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Long-tailed learning aims to tackle the crucial challenge that head classes dominate the training procedure under severe class imbalance in real-world scenarios. However, little attention has been given to how to quantify the dominance severity of head classes in the representation space. Motivated by this, we generalize the cosine-based classifiers to a von Mises-Fisher (vMF) mixture model, denoted as vMF classifier, which enables to quantitatively measure representation quality upon the hyper-sphere space via calculating distribution overlap coefficient. To our knowledge, this is the first work to measure representation quality of classifiers and features from the perspective of distribution overlap coefficient. On top of it, we formulate the inter-class discrepancy and class-feature consistency loss terms to alleviate the interference among the classifier weights and align features with classifier weights. Furthermore, a novel post-training calibration algorithm is devised to zero-costly boost the performance via inter-class overlap coefficients. Our method outperforms previous work with a large margin and achieves state-of-the-art performance on long-tailed image classification, semantic segmentation, and instance segmentation tasks (e.g., we achieve 55.0\% overall accuracy with ResNetXt-50 in ImageNet-LT). Our code is available at https://github.com/VipaiLab/vMF\_OP.
Published: 2022

8. SemiGMMPoint: Semi-supervised point cloud segmentation based on Gaussian mixture models

Author: Zhuang, Xianwei, Wang, Hualiang, He, Xiaoxuan, Fu, Siming, and Hu, Haoji
Published: 2025
Full Text: View/download PDF

9. Meta-prototype Decoupled Training for Long-Tailed Learning

Author: Fu, Siming, Chu, Huanpeng, He, Xiaoxuan, Wang, Hualiang, Yang, Zhenyu, Hu, Haoji, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Wang, Lei, editor, Gall, Juergen, editor, Chin, Tat-Jun, editor, Sato, Imari, editor, and Chellappa, Rama, editor
Published: 2023
Full Text: View/download PDF

10. Class semantic enhancement network for semantic segmentation

Author: Fu, Siming, Wang, Hualiang, Hu, Haoji, He, Xiaoxuan, Long, Yongwen, Bai, Jianhong, Ou, Yangtao, Huang, Yuanjia, and Zhou, Mengqiu
Published: 2023
Full Text: View/download PDF

11. AuxBranch: Binarization residual-aware network design via auxiliary branch search

Author: Fu, Siming, Chu, Huanpeng, Yu, Lu, Peng, Bo, Li, Zheyang, Tan, Wenming, and Hu, Haoji
Published: 2023
Full Text: View/download PDF

12. Meta-prototype Decoupled Training for Long-Tailed Learning

Author: Fu, Siming, primary, Chu, Huanpeng, additional, He, Xiaoxuan, additional, Wang, Hualiang, additional, Yang, Zhenyu, additional, and Hu, Haoji, additional
Published: 2023
Full Text: View/download PDF

13. Ltb-Solver: Long-Tailed Bias Solver for Image Synthesis of Diffusion Models

Author: He, Xiaoxuan, primary, Fu, Siming, additional, and Hu, Haoji, additional
Published: 2024
Full Text: View/download PDF

14. NDP-MSH activates melanocortin-3 receptor to attenuate oxidative stress and neuronal apoptosis in mice with intracerebral hemorrhage via PKC/ERK signaling pathway

Author: FU Siming, WU Xuan, and XIE Zongy
Subjects: intracerebral hemorrhage, oxidative stress, neuronal apoptosis, nle4-msh, melanocortin-3 receptor, Medicine (General), R5-920
Abstract: Objective To investigate the inhibitory effect of Nle4-D-Phe7-α-MSH (NDP-MSH) that binds to melanocortin-3 receptor (Mc3r) on oxidative stress and neuronal apoptosis in mice with intracerebral hemorrhage (ICH). Methods A total of 170 male mice were randomly assigned into sham-operated group (n=22), ICH model group (n=46), and ICH+NDP-MSH group (n=102). The modified Garcia score, Beam balance score, and brain water content of the mice were measured 24 h after ICH. Dual-label immunofluorescence assay was used to detect the colocalization of Mc3r and the neuronal marker NeuN, and ELISA was used to detect the contents of MDA, SOD and CAT contents in the surrounding tissues of cerebral hematoma of the mice. Western blotting was performed to detect the expression of Mc3r, PKC, ERK1/2, Bcl-2, and caspase-3 in the brain tissue of the mice. Results The peak level of Mc3r expression in the brain tissue of the mice occurred 24 h after ICH with a diffuse distribution in the neurons. Treatment with NDP-MSH significantly increased the modified Garcia score and Beam balance score (P < 0.05), decreased water content in the ipsilateral basal ganglia and the cortex (P < 0.05), and lowered ROS activity in the brain tissue of the mice with ICH (P < 0.05). Down-regulation of the expressions of Mc3r and PKC using a Mc3r siRNA and staurosporinein (a PKC inhibitor), respectively, significantly decreased the expressions of p-PKC, p-ERK1/2, and Bcl-2 and enhanced the expression of c-caspase-3 in the brain tissue of the mice (P < 0.05). Conclusion NDP-MSH, by binding with Mc3r, ameliorates oxidative stress and neuronal apoptosis in mice with ICH via the PKC/ERK signaling pathway.
Published: 2020
Full Text: View/download PDF

15. Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

Author: He, Xiaoxuan, primary, Fu, Siming, additional, Ding, Xinpeng, additional, Cao, Yuchen, additional, and Wang, Hualiang, additional
Published: 2023
Full Text: View/download PDF

16. Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

Author: He, Xiaoxuan, Fu, Siming, Ding, Xinpeng, Cao, Yuchen, Wang, Hualiang, He, Xiaoxuan, Fu, Siming, Ding, Xinpeng, Cao, Yuchen, and Wang, Hualiang
Abstract: Recently, large-scale pre-trained vision-language models have presented benefits for alleviating class imbalance in long-tailed recognition. However, the long-tailed data distribution can corrupt the representation space, where the distance between head and tail categories is much larger than the distance between two tail categories. This uneven feature space distribution causes the model to exhibit unclear and inseparable decision boundaries on the uniformly distributed test set, which lowers its performance. To address these challenges, we propose the uniformly category prototype-guided vision-language framework to effectively mitigate feature space bias caused by data imbalance. Especially, we generate a set of category prototypes uniformly distributed on a hypersphere. Category prototype-guided mechanism for image-text matching makes the features of different classes converge to these distinct and uniformly distributed category prototypes, which maintain a uniform distribution in the feature space, and improve class boundaries. Additionally, our proposed irrelevant text filtering and attribute enhancement module allows the model to ignore irrelevant noisy text and focus more on key attribute information, thereby enhancing the robustness of our framework. In the image recognition fine-tuning stage, to address the positive bias problem of the learnable classifier, we design the class feature prototype-guided classifier, which compensates for the performance of tail classes while maintaining the performance of head classes. Our method outperforms previous vision-language methods for long-tailed learning work by a large margin and achieves state-of-the-art performance. © 2023 ACM.
Published: 2023

17. NDP-MSH binding melanocortin-1 receptor ameliorates neuroinflammation and BBB disruption through CREB/Nr4a1/NF-κB pathway after intracerebral hemorrhage in mice

Author: Wu, Xuan, Fu, Siming, Liu, Yun, Luo, Hansheng, Li, Feng, Wang, Yiying, Gao, Meng, Cheng, Yuan, and Xie, Zongyi
Published: 2019
Full Text: View/download PDF

18. Unlocking the Power of Diffusion Probabilistic Models for Long-Tailed Recognition via Data Synthesis

Author: Fu, Siming, primary, He, Xiaoxuan, additional, and Hu, Haoji, additional
Published: 2023
Full Text: View/download PDF

19. Baseline-auxiliary Network Architecture Design Scheme to Compensate for Binarization Residual Errors

Author: Fu, Siming, primary, Ni, Tian, additional, and Hu, Haoji, additional
Published: 2022
Full Text: View/download PDF

20. Meta-BNS FOR Adversarial Data-Free Quantization

Author: Fu, Siming, primary, Wang, Hualiang, additional, Cao, Yuchen, additional, Hu, Haoji, additional, Peng, Bo, additional, Tan, Wenming, additional, and Ye, Tingqun, additional
Published: 2022
Full Text: View/download PDF

21. Renovate Yourself: Calibrating Feature Representation of Misclassified Pixels for Semantic Segmentation

Author: Wang, Hualiang, primary, Chu, Huanpeng, additional, FU, Siming, additional, Liu, Zuozhu, additional, and Hu, Haoji, additional
Published: 2022
Full Text: View/download PDF

22. Comparatively investigation of real MSW and biomass tar treatment by a rotating gliding arc

Author: Kong, Xiangzhi, primary, Wu, Angjian, additional, Fu, Siming, additional, Xu, Ruiyang, additional, Zhao, Yucheng, additional, Li, Xiaodong, additional, and Yan, Jianhua, additional
Published: 2021
Full Text: View/download PDF

23. Activation of the Melanocortin-1 Receptor by NDP-MSH Attenuates Oxidative Stress and Neuronal Apoptosis through PI3K/Akt/Nrf2 Pathway after Intracerebral Hemorrhage in Mice

Author: Fu, Siming, primary, Luo, Xu, additional, Wu, Xuan, additional, Zhang, Tongyu, additional, Gu, Linggui, additional, Wang, Yiying, additional, Gao, Meng, additional, Cheng, Yuan, additional, and Xie, Zongyi, additional
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

23 results on '"Fu, Siming"'

1. Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation

2. LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

3. MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

4. MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance

5. TrainerAgent: Customizable and Efficient Model Training through LLM-Powered Multi-Agent System

6. Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

7. Towards Calibrated Hyper-Sphere Representation via Distribution Overlap Coefficient for Long-tailed Learning

8. SemiGMMPoint: Semi-supervised point cloud segmentation based on Gaussian mixture models

9. Meta-prototype Decoupled Training for Long-Tailed Learning

10. Class semantic enhancement network for semantic segmentation

11. AuxBranch: Binarization residual-aware network design via auxiliary branch search

12. Meta-prototype Decoupled Training for Long-Tailed Learning

13. Ltb-Solver: Long-Tailed Bias Solver for Image Synthesis of Diffusion Models

14. NDP-MSH activates melanocortin-3 receptor to attenuate oxidative stress and neuronal apoptosis in mice with intracerebral hemorrhage via PKC/ERK signaling pathway

15. Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

16. Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition

17. NDP-MSH binding melanocortin-1 receptor ameliorates neuroinflammation and BBB disruption through CREB/Nr4a1/NF-κB pathway after intracerebral hemorrhage in mice

18. Unlocking the Power of Diffusion Probabilistic Models for Long-Tailed Recognition via Data Synthesis

19. Baseline-auxiliary Network Architecture Design Scheme to Compensate for Binarization Residual Errors

20. Meta-BNS FOR Adversarial Data-Free Quantization

21. Renovate Yourself: Calibrating Feature Representation of Misclassified Pixels for Semantic Segmentation

22. Comparatively investigation of real MSW and biomass tar treatment by a rotating gliding arc

23. Activation of the Melanocortin-1 Receptor by NDP-MSH Attenuates Oxidative Stress and Neuronal Apoptosis through PI3K/Akt/Nrf2 Pathway after Intracerebral Hemorrhage in Mice

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

23 results on '"Fu, Siming"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources