[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
Open-source, End-to-end, Lightweight, Vision-Language-Action model for GUI Agent & Computer Use.
ShowUI 是一款开源的、端到端、轻量级的视觉-语言-动作模型,专为 GUI 智能体设计。
   📑 Paper   
| 🤗 Hugging Models  
|    🤗 Spaces Demo   
|    📝 Slides   
|    🕹️ OpenBayes贝式计算 Demo
🤗 Datasets   |   💬 X (Twitter)  
|    🖥️ Computer Use   
|    📖 GUI Paper List   
|    🤖 ModelScope
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Weixian Lei, Lijuan Wang, Mike Zheng Shou
Show Lab @ National University of Singapore, Microsoft
python3 api.py
.ShowUI-web
dataset.showui
for UI-guided token selection implementation.ShowUI-desktop
.showlab/ShowUI-2B
is available at huggingface.See inference_vllm.ipynb for vllm inference.
To leverage multiple GPUs for faster inference, you can adjust the gpu_num parameter
Run python3 api.py
by providing a screenshot and a query.
Since we are based on huggingface gradio client, you don’t need a GPU to deploy the model locally 🤗
See Computer Use OOTB for using ShowUI to control your PC.
https://github.com/user-attachments/assets/f50b7611-2350-4712-af9e-3d31e30020ee
See Quick Start for local model usage.
See Gradio for installation.
Our Training codebases supports:
See Train for training set up.
Try test.ipynb
, which seamless support for Qwen2VL models.
Try recaption.ipynb
, where we provide instructions on how to recaption the original annotations using GPT-4o.
We extend our gratitude to SeeClick for providing their codes and datasets.
Special thanks to Siyuan for assistance with the Gradio demo and OOTB support.
If you find our work helpful, please kindly consider citing our paper.
@misc{lin2024showui,
title={ShowUI: One Vision-Language-Action Model for GUI Visual Agent},
author={Kevin Qinghong Lin and Linjie Li and Difei Gao and Zhengyuan Yang and Shiwei Wu and Zechen Bai and Weixian Lei and Lijuan Wang and Mike Zheng Shou},
year={2024},
eprint={2411.17465},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.17465},
}
If you like our project, please give us a star ⭐ on GitHub for the latest update.