OmniParser
RAPR CLI connectorBrowser & Desktop AutomationMicrosoft screen parsing for vision-based GUI agents and computer-use workflows.
OmniParser package details
RAPR CLI connector scope
This is RAPR-authored connector guidance for a command-line tool the user installs locally. The upstream CLI remains governed by its own license and terms. This package contains RAPR-authored CLI usage guidance, install commands, and agent instructions. It does not bundle the upstream CLI binary. Users can also go directly to the public upstream source linked on this page.
How to get started
Install RAPR AI
Download and install RAPR AI on your computer
Find in Marketplace
Open RAPR AI, go to Packages, and browse the marketplace
Install from Marketplace
Click Install. RAPR sets up the wrapper package, connector guidance, or skill instructions for this listing.
OmniParser
OmniParser is Microsoft's screen parsing tool for pure vision-based GUI agents. It parses UI screenshots into structured, easy-to-understand elements so agents can reason about screen regions before acting.
What you can do
- Parse screenshots: Detect UI elements, text, icons, and interactable regions.
- Ground computer-use agents: Convert visual screens into structured regions for downstream actions.
- Add visual fallback: Help when DOM selectors or Windows UI Automation trees are unavailable.
- Prototype agent perception: Pair with Browser Harness, pywinauto/UIAutomation, Playwright, or another controller.
Windows Setup
The RAPR skill installs immediately. Full OmniParser setup is manual because it requires a repo clone, Python/conda environment, model weights, and possibly GPU/CUDA choices.
$dest = Join-Path $env:USERPROFILE "Developer\OmniParser"
git clone https://github.com/microsoft/OmniParser.git $dest
cd $dest
conda create -n omni python==3.12
conda activate omni
pip install -r requirements.txt
Then download V2 weights into weights using the upstream README instructions.
Example Requests
- "Parse this screenshot and identify the clickable UI regions."
- "Use OmniParser as the perception layer for a desktop automation workflow."
- "Compare OmniParser output with the Windows UIA tree and choose the safer action target."
Keep screenshots local unless the user approves upload. Review model checkpoint licenses before redistribution or commercial deployment.
Ready to try OmniParser?
Download RAPR AI and connect OmniParser in seconds.