Back to blog

Engineering · May 30, 2026

MCP Spotlight: WebDriverIO MCP — Browser + Mobile App Automation, One Server, 34 Tools

The official WebDriverIO MCP server gives AI agents 34 tools to automate Chrome, Firefox, Edge, and Safari browsers — plus iOS and Android apps via Appium — all through a unified interface. Session recording, device emulation, and BrowserStack support included.

MCP ServerWebDriverIOBrowser AutomationMobile TestingAppiumAI Agents

MCP Spotlight: WebDriverIO MCP — Browser + Mobile App Automation, One Server, 34 Tools

Server: @wdio/mcp by WebDriverIO Stars: 1.3k+ · License: MIT · Tools: 34 · Latest: v3.2.4 (updated today) MCP Tracker: glama.ai/mcp/servers/webdriverio/mcp Docs: webdriver.io/docs/mcp

Most browser automation MCP servers stop at the browser. WebDriverIO's official MCP server goes further: the same 34 tools control Chrome, Firefox, Edge, and Safari — plus iOS and Android native apps via Appium, all through one unified interface.

Install in one line:

npx -y @wdio/mcp@latest

What Sets It Apart

WebDriverIO MCP is mobile-first by design. Unlike browser-only alternatives, it supports iOS simulators, Android emulators, and real devices from day one. It's built on the battle-tested WebDriverIO framework — provenance matters when your agent is running headless automation in CI.

CapabilityDetails
BrowsersChrome (headed/headless), Firefox, Edge, Safari
Mobile platformsiOS (XCUITest), Android (UiAutomator2) — simulators, emulators, and real devices
Transportstdio (default) + optional HTTP for clients that can't launch subprocesses
Element detectionCross-platform: CSS selectors, XPath, accessibility ID, iOS predicates, UiAutomator
Session recordingEvery tool call automatically recorded, exportable as runnable WebDriverIO JS
Device emulationApply mobile/tablet presets (iPhone 15, Pixel 7) to simulate responsive layouts
BrowserStackBuilt-in support for real iOS/Android devices and browser matrices in the cloud

The Tool Set

The 34 tools break down into seven categories:

Session Management (4 tools): start_session, launch_chrome (with remote debugging), close_session (with detach support), emulate_device

Navigation & Page Interaction (7 tools): navigate, get_elements (viewport-filtered, paginated), get_accessibility_tree (role-filtered), get_screenshot (auto-resized ≤1MB), get_tabs, scroll, execute_script

Element Interaction (3 tools): click_element, set_value, switch_frame / switch_tab

Mobile Gestures (3 tools): tap_element, swipe (directional), drag_and_drop

Context Switching (2 tools): get_contexts, switch_context — seamless native ↔ webview transitions in hybrid apps

Device Control (7 tools): get_app_state, rotate_device, lock_device / unlock_device, get_geolocation, set_geolocation, show_keyboard / hide_keyboard, press_key

Cookie & BrowserStack (8 tools): get_cookies, set_cookie, delete_cookies, upload_app, list_apps, list_devices, take_healing_snapshot, get_session_logs

Architecture: The Bridge Model

WebDriverIO MCP acts as a protocol bridge between AI assistants and automation engines:

┌─────────────┐    MCP (stdio)    ┌──────────────┐
│  AI Agent   │ ◄──────────────►  │  @wdio/mcp   │
└─────────────┘                   └──────┬───────┘
                                         │ WebDriverIO API
                    ┌────────────────────┼───────────────────┐
                    │                    │                   │
               Chrome/Firefox       Appium iOS          Appium Android
               (W3C WebDriver)      (XCUITest)          (UiAutomator2)
  • Single-session model: one active browser or app session at a time, state maintained globally across tool calls
  • Auto-detach: sessions with noReset: true automatically detach on close, preserving state
  • Smart element detection: on mobile, parses XML page source in 2 HTTP calls instead of 600+ traditional queries, generating multiple locator strategies per element
  • HTTP transport option: for clients that can't launch subprocesses (OpenAI Codex secure mode, llama.cpp), the server supports HTTP mode on any port

Session Recording: The Audit Trail Angle

Every tool call is automatically recorded and exportable as runnable WebDriverIO JavaScript. For teams using Facio as their agent runtime, this creates a natural handoff:

  1. Your agent automates a test flow via WebDriverIO MCP
  2. Facio captures every tool call in its audit trail
  3. The session recording exports as executable JS you can commit to your test suite

This closes the loop from agent-driven exploration to deterministic, repeatable CI tests — with full traceability at every step.

Facio Integration

{
  "mcpServers": {
    "wdio-mcp": {
      "command": "npx",
      "args": ["-y", "@wdio/mcp@latest"]
    }
  }
}

For BrowserStack real-device testing:

{
  "mcpServers": {
    "wdio-mcp": {
      "command": "npx",
      "args": ["-y", "@wdio/mcp@latest"],
      "env": {
        "BROWSERSTACK_USERNAME": "${credentials.BROWSERSTACK_USERNAME}",
        "BROWSERSTACK_ACCESS_KEY": "${credentials.BROWSERSTACK_ACCESS_KEY}"
      }
    }
  }
}

For HTTP transport (Facio agents running in containerized environments that can't spawn subprocesses):

npx @wdio/mcp --http --port 3000 --allowedOrigins "*"

Then configure the MCP endpoint as http://localhost:3000/mcp.

Quickstart Examples

Browser automation:

"Open Chrome headless and navigate to https://webdriver.io. Take a screenshot, find all visible links in the nav bar, and check if the 'Get Started' button is present."

Mobile web testing with device emulation:

"Start a Chrome session, emulate an iPhone 15, navigate to our checkout page, and take a screenshot at 390×844."

Native iOS app testing:

"Start my iOS app on the iPhone 15 simulator. Tap the login button, type 'test@example.com' into the email field, swipe up to scroll to the submit button, and take a screenshot."

Hybrid app context switching:

"Launch the app on Android. Check available contexts, switch to WEBVIEW_com.myapp, find the search input, and type 'test query'."

BrowserStack on real device:

"Start a BrowserStack session on a Samsung Galaxy S23 running Android 13, upload my app .apk, install it, and run through the signup flow."

When to Choose WebDriverIO MCP

WebDriverIO MCP is the right choice when:

  • You need mobile + browser from a single MCP server, not two separate ones
  • You're already in the WebDriverIO ecosystem and want session recordings to feed into your test suite
  • You need real-device testing via BrowserStack without switching toolchains
  • Your agent needs Appium-level device control (geolocation, rotation, keyboard, app lifecycle) — not just viewport emulation

For pure browser-only workflows, simpler alternatives exist. But for cross-platform test automation driven by an AI agent, WebDriverIO MCP is the most complete option available today.

Bottom Line

WebDriverIO MCP packs 34 tools across browsers and mobile platforms into a single, actively maintained package from a team that's been building test automation for over a decade. The session recording feature — combined with Facio's built-in audit trail — creates a clean path from agent-driven exploration to reproducible CI tests.

At npx -y @wdio/mcp@latest, it's zero-config evaluation. Just add the JSON and ask your agent to open a browser.


MCP Spotlight is a series covering servers that give AI agents real capabilities. Every server is evaluated for tool quality, cross-platform reach, and integration fit with Facio's HITL-first agent runtime.