Multimodal AI agent from ByteDance. GUI agent with vision for browsers, terminals, and desktops.
Agent TARS (UI-TARS Desktop) is an open-source multimodal AI agent stack from ByteDance. It brings GUI agent and vision capabilities into terminals, computers, and browsers with seamless MCP tool integration. The agent can see your screen, understand UI elements, and take actions — clicking buttons, filling forms, and navigating applications. Supports both cloud and local model backends for visual understanding.