Let AI be your browser operator.
English | ็ฎไฝไธญๆ
Let AI be your browser operator.
Midscene.js lets AI be your browser operator ๐ค.Just describe what you want to do in natural language, and it will help you operate web pages, validate content, and extract data. Whether you want a quick experience or deep development, you can get started easily.
The following recorded example video is based on the UI-TARS 7B SFT model, and the video has not been sped up at all~
Instruction | Video |
---|---|
Post a Tweet | |
Use JS code to drive task orchestration, collect information about Jay Chouโs concert, and write it into Google Docs |
Besides the default model GPT-4o, we have added two new recommended open-source models to Midscene.js: UI-TARS and Qwen2.5-VL. (Yes, Open Source !) They are dedicated models for image recognition and UI automation, which are known for performing well in UI automation scenarios. Read more about it in Choose a model.
UI-TARS
and Qwen2.5-VL
, which outperforms closed-source models like GPT-4o and Claude in UI automation scenarios while better protecting data security.gpt-4o
, it works well for most cases. And also, gemini-1.5-pro
, qwen-vl-max-latest
are supported.UI-TARS
model, which is an open-source model dedicated for UI automation. You can deploy it on your own server, and it will dramatically improve the performance and data privacy.There are so many UI automation tools out there, and each one seems to be all-powerful. Whatโs special about Midscene.js?
Debugging Experience: You will soon find that debugging and maintaining automation scripts is the real challenge point. No matter how magic the demo is, you still need to debug the process to make it stable over time. Midscene.js offers a visualized report file, a built-in playground, and a Chrome Extension to debug the entire process. This is what most developers really need. And weโre continuing to work on improving the debugging experience.
Open Source, Free, Deploy as you want: Midscene.js is an open-source project. Itโs decoupled from any cloud service and model provider, you can choose either public or private deployment. There is always a suitable plan for your business.
Integrate with Javascript: You can always bet on Javascript ๐
If you use Midscene.js in your research or project, please cite:
@software{Midscene.js,
author = {Zhou, Xiao and Yu, Tao},
title = {Midscene.js: Let AI be your browser operator.},
year = {2025},
publisher = {GitHub},
url = {https://github.com/web-infra-dev/midscene}
}
Midscene.js is MIT licensed.