A UI-Focused Agent for Windows OS Interaction.
Turn natural‑language requests into automatic, reliable, multi‑application workflows on Windows, beyond UI-Focused.
Deep OS Integration | Picture‑in‑Picture Desktop (coming soon) | Hybrid GUI + API Actions |
---|---|---|
Combines Windows UIA, Win32 and WinCOM for first‑class control detection and native commands. | Automation runs in a sandboxed virtual desktop so you can keep using your main screen. | Chooses native APIs when available, falls back to clicks/keystrokes when not—fast and robust. |
Speculative Multi‑Action | Continuous Knowledge Substrate | UIA + Visual Control Detection |
---|---|---|
Bundles several predicted steps into one LLM call, validated live—up to 51 % fewer queries. | Mixes docs, Bing search, user demos and execution traces via RAG for agents that learn over time. | Detects standard and custom controls with a hybrid UIA + vision pipeline. |
See the documentation for full details.
UFO² operates as a Desktop AgentOS, encompassing a multi-agent framework that includes:
For a deep dive see our technical report or the docs site.
UFO sightings have garnered attention from various media outlets, including:
These sources provide insights into the evolving landscape of technology and the implications of UFO phenomena on various platforms.
UFO requires Python >= 3.10 running on Windows OS >= 10. It can be installed by running the following command:
# [optional to create conda environment]
# conda create -n ufo python=3.10
# conda activate ufo
# clone the repository
git clone https://github.com/microsoft/UFO.git
cd UFO
# install the requirements
pip install -r requirements.txt
# If you want to use the Qwen as your LLMs, uncomment the related libs.
Before running UFO, you need to provide your LLM configurations individually for HostAgent and AppAgent. You can create your own config file ufo/config/config.yaml
, by copying the ufo/config/config.yaml.template
and editing config for HOST_AGENT and APP_AGENT as follows:
copy ufo\config\config.yaml.template ufo\config\config.yaml
notepad ufo\config\config.yaml # paste your key & endpoint
VISUAL_MODE: True, # Whether to use the visual mode
API_TYPE: "openai" , # The API type, "openai" for the OpenAI API.
API_BASE: "https://api.openai.com/v1/chat/completions", # The the OpenAI API endpoint.
API_KEY: "sk-", # The OpenAI API key, begin with sk-
API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
API_MODEL: "gpt-4-vision-preview", # The only OpenAI model
VISUAL_MODE: True, # Whether to use the visual mode
API_TYPE: "aoai" , # The API type, "aoai" for the Azure OpenAI.
API_BASE: "YOUR_ENDPOINT", # The AOAI API address. Format: https://{your-resource-name}.openai.azure.com
API_KEY: "YOUR_KEY", # The aoai API key
API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
API_MODEL: "gpt-4-vision-preview", # The only OpenAI model
API_DEPLOYMENT_ID: "YOUR_AOAI_DEPLOYMENT", # The deployment id for the AOAI API
Need Qwen, Gemini, non‑visual GPT‑4, or even OpenAI CUA Operator as a AppAgent? See the model guide.
If you want to enhance UFO’s ability with external knowledge, you can optionally configure it with an external database for retrieval augmented generation (RAG) in the ufo/config/config.yaml
file.
We provide the following options for RAG to enhance UFO’s capabilities:
Consult their respective documentation for more information on how to configure these settings.
# assume you are in the cloned UFO folder
python -m ufo --task <your_task_name>
This will start the UFO process and you can interact with it through the command line interface.
If everything goes well, you will see the following message:
Welcome to use UFO🛸, A UI-focused Agent for Windows OS Interaction.
_ _ _____ ___
| | | || ___| / _ \
| | | || |_ | | | |
| |_| || _| | |_| |
\___/ |_| \___/
Please enter your request to be completed🛸:
Alternatively, you can also directly invoke UFO with a specific task and request by using the following command:
python -m ufo --task <your_task_name> -r "<your_request>"
You can find the screenshots taken and request & response logs in the following folder:
./ufo/logs/<your_task_name>/
You may use them to debug, replay, or analyze the agent output.
UFO² is rigorously benchmarked on two publicly‑available live‑task suites:
Benchmark | Scope | Documents |
---|---|---|
Windows Agent Arena (WAA) | 154 real Windows tasks across 15 applications (Office, Edge, File Explorer, VS Code, …) | https://microsoft.github.io/UFO/benchmark/windows_agent_arena/ |
OSWorld (Windows) | 49 cross‑application tasks that mix Office 365, browser and system utilities | https://microsoft.github.io/UFO/benchmark/osworld |
The integration of these benchmarks into UFO² is in separate repositories. Please follow the above documents for more details.
If you build on this work, please cite our the AgentOS framework:
UFO² – The Desktop AgentOS (2025)
https://arxiv.org/abs/2504.14603
@article{zhang2025ufo2,
title = {{UFO2: The Desktop AgentOS}},
author = {Zhang, Chaoyun and Huang, He and Ni, Chiming and Mu, Jian and Qin, Si and He, Shilin and Wang, Lu and Yang, Fangkai and Zhao, Pu and Du, Chao and Li, Liqun and Kang, Yu and Jiang, Zhao and Zheng, Suzhen and Wang, Rujia and Qian, Jiaxu and Ma, Minghua and Lou, Jian-Guang and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei},
journal = {arXiv preprint arXiv:2504.14603},
year = {2025}
}
UFO – A UI‑Focused Agent for Windows OS Interaction (2024)
https://arxiv.org/abs/2402.07939
@article{zhang2024ufo,
title = {{UFO: A UI-Focused Agent for Windows OS Interaction}},
author = {Zhang, Chaoyun and Li, Liqun and He, Shilin and Zhang, Xu and Qiao, Bo and Qin, Si and Ma, Minghua and Kang, Yu and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei and Zhang, Qi},
journal = {arXiv preprint arXiv:2402.07939},
year = {2024}
}
The UFO² team is actively working on the following features and improvements:
By choosing to run the provided code, you acknowledge and agree to the following terms and conditions regarding the functionality and data handling practices in DISCLAIMER.md
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
trademarks or logos is subject to and must follow
Microsoft’s Trademark & Brand Guidelines.
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party’s policies.
This repository is released under the MIT License (SPDX‑Identifier: MIT).
See DISCLAIMER.md for privacy & safety notices.
© Microsoft 2025 • UFO² is an open‑source project, not an official Windows feature.