Back to Marketplace

OmniParser

RAPR CLI connectorBrowser & Desktop Automation

Microsoft screen parsing for vision-based GUI agents and computer-use workflows.

By Microsoft / RAPR AIv1.0.0Package license: MITFree & Open Source; model checkpoints have their own licenses

OmniParser package details

computer-usevisionscreen-parsinggui-agentwindows

RAPR CLI connector scope

This is RAPR-authored connector guidance for a command-line tool the user installs locally. The upstream CLI remains governed by its own license and terms. This package contains RAPR-authored CLI usage guidance, install commands, and agent instructions. It does not bundle the upstream CLI binary. Users can also go directly to the public upstream source linked on this page.

How to get started

Install RAPR AI

Download and install RAPR AI on your computer

Find in Marketplace

Open RAPR AI, go to Packages, and browse the marketplace

Install from Marketplace

Click Install. RAPR sets up the wrapper package, connector guidance, or skill instructions for this listing.

OmniParser

OmniParser is Microsoft's screen parsing tool for pure vision-based GUI agents. It parses UI screenshots into structured, easy-to-understand elements so agents can reason about screen regions before acting.

What you can do

  • Parse screenshots: Detect UI elements, text, icons, and interactable regions.
  • Ground computer-use agents: Convert visual screens into structured regions for downstream actions.
  • Add visual fallback: Help when DOM selectors or Windows UI Automation trees are unavailable.
  • Prototype agent perception: Pair with Browser Harness, pywinauto/UIAutomation, Playwright, or another controller.

Windows Setup

The RAPR skill installs immediately. Full OmniParser setup is manual because it requires a repo clone, Python/conda environment, model weights, and possibly GPU/CUDA choices.

$dest = Join-Path $env:USERPROFILE "Developer\OmniParser"
git clone https://github.com/microsoft/OmniParser.git $dest
cd $dest
conda create -n omni python==3.12
conda activate omni
pip install -r requirements.txt

Then download V2 weights into weights using the upstream README instructions.

Example Requests

  • "Parse this screenshot and identify the clickable UI regions."
  • "Use OmniParser as the perception layer for a desktop automation workflow."
  • "Compare OmniParser output with the Windows UIA tree and choose the safer action target."

Keep screenshots local unless the user approves upload. Review model checkpoint licenses before redistribution or commercial deployment.

Ready to try OmniParser?

Download RAPR AI and connect OmniParser in seconds.