Skip to main content

Vision Exploration (Experimental)

warning

Vision Exploration is still experimental and is currently supported only on Windows.

What is Vision Exploration?

Vision Exploration is an intelligent desktop operation assistant. It can "look" at screenshots like a human, then automatically plan and execute mouse clicks, keyboard input, and other actions to complete tasks in desktop applications.

Requirements

Your client device must support the following permissions:

  • Screen capture: the Agent needs to see your screen
  • Computer control: the Agent needs to control the computer and simulate mouse and keyboard actions

How to Use It

Just describe the complete goal you want to achieve in natural language. The Agent will plan and execute step by step automatically.

Good examples:

  • "Open Tencent Meeting and create a meeting"
  • "Open DingTalk and send a message to xxx"

Bad examples:

  • "Click the lower-left corner" - do not provide step-by-step instructions; let the Agent plan
  • "Operate the computer" - too vague; the goal needs to be explicit

What the Agent Can Do

ActionDescriptionExample
Single clickClick buttons, links, and other on-screen elementsClick the Confirm button
Double clickOpen files or select textDouble-click a document on the desktop
Right clickOpen context menusRight-click a file to view properties
Type textEnter content into an input boxEnter keywords into a search box
Open appLaunch a desktop applicationOpen a browser or Notepad
Keyboard inputSimulate keys or shortcutsCtrl+S to save, Enter to confirm
HoverTrigger menus or tooltips by moving the mouse over an elementHover to view a tooltip
DragDrag from one location to anotherDrag a file into a folder
ScrollScroll vertically or horizontallyScroll down to see more

Execution Process

  1. Observe the screen: the Agent captures the current screen
  2. Analyze and think: AI examines the screenshot and decides the next action
  3. Perform the action: it executes one action automatically, such as clicking a button
  4. Check the result: it compares before and after screenshots to see whether the action succeeded
  5. Repeat: it continues until the task is complete

Throughout the process, you can see each action in real time in the step list.

Notes

  • Make sure the target application's interface is visible and not blocked. You can use Floating Window Mode to hide the bit-Agent window.
  • If an action fails, the Agent automatically tries alternative methods
  • Complex tasks may require multiple steps, so please be patient
  • If you need to cancel midway, you can stop the task at any time