1. VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs
- Author
-
Zhao, Zixiao, Sun, Jing, Wei, Zhiyuan, Cai, Cheng-Hao, Hou, Zhe, and Dong, Jin Song
- Subjects
Computer Science - Software Engineering ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Multiagent Systems - Abstract
In the field of automated programming, large language models (LLMs) have demonstrated foundational generative capabilities when given detailed task descriptions. However, their current functionalities are primarily limited to function-level development, restricting their effectiveness in complex project environments and specific application scenarios, such as complicated image-processing tasks. This paper presents a multi-agent framework that utilises a hybrid set of LLMs, including GPT-4o and locally deployed open-source models, which collaboratively complete auto-programming tasks. Each agent plays a distinct role in the software development cycle, collectively forming a virtual organisation that works together to produce software products. By establishing a tree-structured thought distribution and development mechanism across project, module, and function levels, this framework offers a cost-effective and efficient solution for code generation. We evaluated our approach using benchmark datasets, and the experimental results demonstrate that VisionCoder significantly outperforms existing methods in image processing auto-programming tasks.
- Published
- 2024