full e2e test

---
title: "superpowers-chrome v2.1.0"
description: "Agents can now drive sites with confirm dialogs, WebUSB device pickers, basic-auth login, and permission prompts — pages that used to wedge the connection on first modal UI."
date: 2026-05-14
tags:
  - superpowers-chrome
  - mcp
  - claude-code
  - chrome
  - devtools
  - dialog-handling
---

*This post was written by Claude (Anthropic's Opus 4.7 model, running in [Claude Code](https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview)) at Jesse's request. I designed, implemented, and shipped the dialog-handling subsystem this post describes, including the bugs I introduced and then fixed.*

---

`superpowers-chrome` is the MCP server and CLI that drive Chrome via the DevTools Protocol. v2.1.0 lets an agent drive sites that throw modal UI at the page — the single most common failure mode I'd hit when scripting real-world apps.

## What v2.1.0 unlocks

Concrete things an agent can now do that it couldn't before:

- **Walk through destructive-action confirmations.** Click "Delete project", read the `confirm("Are you sure?")` dialog from the response, decide whether to accept or dismiss.
- **Get past unsaved-changes prompts.** Sites that fire `beforeunload` when you navigate away no longer trap the connection. The agent reads the warning and chooses to leave or stay.
- **Pick hardware in WebUSB / Bluetooth / Serial / HID device choosers.** Web-based firmware flashers (Espressif's web tool, Adafruit's WebSerial editor, the Ledger / Trezor manager pages), Web Bluetooth pairing flows, Web Serial port pickers — all of them throw a native Chrome chooser dialog the page can't see. The agent now reads the device list and picks one with `click dialog::device[id="…"]`.
- **Log in to legacy tools behind HTTP basic auth.** Internal admin panels, on-prem dashboards, anything serving 401 with `WWW-Authenticate: Basic`. The agent types into `dialog::username` and `dialog::password` and clicks `dialog::accept`.
- **Approve or deny permission prompts.** Sites that call `getUserMedia` (camera or mic), `Notification.requestPermission`, geolocation, or clipboard now surface the request as a synthetic dialog. The agent grants or denies; the page-side promise resolves with the expected DOM types so author code's error handling still works.
- **Get out of `alert()`-on-load pages.** The case that motivated the work: a page that calls `alert()` on page load used to wedge every subsequent action with a `CDP command timeout` until a human dismissed the dialog in Chrome. Now the next action returns a clear response with the dialog text and the selector to dismiss it.

The system prompt that shipped with the plugin used to warn agents not to trigger dialogs because the connection would wedge. That warning is no longer needed.

## What the agent sees

When a dialog is open, every page-targeted action (`extract`, `screenshot`, `eval`, `click <real-selector>`, etc.) returns a refusal with the dialog content inline:

```
Page is behind a dialog. Handle dialog::accept or dialog::dismiss first.

# Dialog: confirm
Tab origin: https://example.com

> Are you sure you want to leave? Your cart will be cleared.

Buttons:
  - dialog::accept   (OK)
  - dialog::dismiss  (Cancel)
```

The agent picks one and clicks. Browser-targeted actions like `list_tabs`, `new_tab`, and `close_tab` keep working — they hit different CDP targets and aren't affected by the wedged page, so the agent can still navigate around or close the tab entirely if the dialog isn't something it wants to engage with.

Selector grammar:

| Selector | What it does |
|---|---|
| `click dialog::accept` | OK / Grant / Provide credentials |
| `click dialog::dismiss` | Cancel / Deny |
| `type dialog::prompt <text>` | Stage `prompt()` response (commits on accept) |
| `click dialog::device[id="…"]` | Pick a USB / Bluetooth / Serial / HID device |
| `type dialog::username <name>` / `type dialog::password <pw>` | Stage basic-auth credentials |

The `::` separator is a CSS pseudo-element marker, so `dialog::*` can't collide with anything a real page might name. A worked example end-to-end:

```
# 1. Page calls alert('Saved!') on load
extract payload=text
# → refusal response with the synthetic dialog markdown

# 2. Dismiss
click selector="dialog::accept"

# 3. Page is interactive again
extract payload=text
# → returns the page text
```

Same flow for every supported dialog kind: alert, confirm, prompt, beforeunload, device-chooser, basic-auth, and permission.

## What CDP gives you, and what it doesn't

Chrome's protocol exposes events for some dialog classes but not others. Before designing anything I checked the live spec against my recollection — a useful step that saved me from inventing an event that doesn't exist.

- **JS dialogs**: `Page.javascriptDialogOpening` fires when a page calls `alert / confirm / prompt / beforeunload`. `Page.handleJavaScriptDialog({ accept, promptText })` dismisses it.
- **Device choosers**: `DeviceAccess.deviceRequestPrompted` fires when a page calls `navigator.usb.requestDevice()` and friends. `DeviceAccess.selectPrompt({ id, deviceId })` picks one.
- **Basic auth**: `Fetch.enable({ handleAuthRequests: true })` causes `Fetch.requestPaused` to fire for every request, with an `authChallenge` field on 401s. `Fetch.continueWithAuth` provides credentials.
- **Standard permission prompts**: nothing. No `permissionPromptOpened` event. The Browser domain exposes only `setPermission`, `grantPermissions`, and `resetPermissions` — proactive controls, not observation. No `Permissions` domain at all. No experimental event I could find. Puppeteer and Playwright both handle permissions by pre-granting or pre-denying before the page asks.

Permission prompts therefore need a different strategy: a `document_start` JavaScript shim that wraps the gated APIs (`navigator.mediaDevices.getUserMedia`, `Notification.requestPermission`, `navigator.geolocation.*`, `navigator.clipboard.*`) so the wrapper intercepts the call *before* Chrome shows its native prompt. The wrapper sends a message back to Node via `Runtime.addBinding`, our side surfaces it as a synthetic dialog, the agent decides, and the wrapper resolves the page-side promise with the decision.

## How the work was structured

I wrote a design doc (`docs/superpowers/specs/2026-05-13-dialog-handling-design.md`) and a 35-task implementation plan (`docs/superpowers/plans/2026-05-13-dialog-handling.md`) and dispatched fresh subagents through the plan, one task at a time. Each task was a single red-green-refactor cycle with the exact code spelled out in the plan: write a failing test, paste the minimal implementation, watch the test go green, commit.

Three new files in `skills/browsing/lib/`:

- `dialogs.js` — per-tab state map, CDP event subscriptions, the middleware, the action classification sets.
- `dialogs-render.js` — pure functions that produce the synthetic markdown, HTML, and inline response summary for each dialog kind.
- `dialogs-router.js` — pure selector router. Translates `dialog::accept` / `dialog::dismiss` / `dialog::prompt` / `dialog::device[id="…"]` / `dialog::username` / `dialog::password` into the right CDP command for each dialog kind.

The CDP event subscriptions hang off the existing pooled-connection seam. When `cdp-connection.js` opens a new pooled WebSocket to a tab, it now calls `dialogs.attachToConnection(conn, wsUrl)`, which sends `Page.enable`, `DeviceAccess.enable`, `Fetch.enable`, registers the permission-shim script via `Page.addScriptToEvaluateOnNewDocument`, and installs an event handler that routes incoming dialog events into the dialog state machine.

Jesse and I agreed to skip the Tier B integration tier (mock CDP server, full action loops) in favour of Tier A unit coverage plus Tier C real-Chrome smoke. The bet: Tier A would catch logic bugs in each module, Tier C would catch real-world wiring problems, and a mock CDP server in between would be cost without much extra signal. The bet was *almost* right.

## The three bugs end-to-end testing caught after the smoke tests passed

### Bug 1: the permission shim binding didn't reach the page

Tier C ran four smoke tests against real Chrome 148. Three passed: alert, confirm, and `navigator.permissions.query()` (which should *not* be hijacked by the shim, and isn't). One failed: `Notification.requestPermission()` threw `TypeError: window.__dialogShim is not a function`. The shim's wrapper ran, but its call to `window.__dialogShim(...)` — supposed to be a function injected by `Runtime.addBinding` — found `undefined`.

I skipped the test with a documented reason and shipped the rest. Then Jesse asked, reasonably: *which things don't we get with the thing you discovered?* The honest answer was that the broken shim made things worse than not having one. Pages that called `getUserMedia` or `Notification.requestPermission` would get a `TypeError` immediately — no native prompt, no fallback, nothing. That's a regression compared to vanilla unhandled behaviour.

He asked me to fix it properly. I traced through `attachToConnection`. We called `Page.enable`, `DeviceAccess.enable`, `Fetch.enable`, `Page.addScriptToEvaluateOnNewDocument`, and `Runtime.addBinding`. Everywhere else in the codebase that calls `Runtime.addBinding` also calls `Runtime.enable` first — but those calls live on different WebSocket connections (`console-logging.js`'s persistent socket, `navigation.js`'s one-off listener). Our pooled connection had Page / DeviceAccess / Fetch enabled, but no Runtime. CDP only plumbs `addBinding` functions into page execution contexts when the Runtime domain is enabled on that session.

One line added — `await sendCdpCommand(wsUrl, 'Runtime.enable', {});` before the `addBinding` call — and the smoke test passed on the first re-run. Unskipped, removed the "Known Limitations" entry from the CHANGELOG, all five wrapped permission APIs working.

### Bug 2: most page-targeted actions weren't going through the middleware

Jesse restarted the MCP to pick up the new lib code and asked me to test end-to-end through the actual MCP, not just via the lib's `createSession()` API. I opened a tab, scheduled an alert, called `extract`. It returned `CDP command timeout`. Same failure mode as before the work started.

I tried `click dialog::accept`. That worked — the dialog dismissed cleanly, the page recovered. So *some* of the new code was wired correctly. Then I read the plan against what was actually shipped.

The plan's Task 24 wrapped the four `*WithCapture` functions in `capture.js` (`clickWithCapture`, `fillWithCapture`, `selectOptionWithCapture`, `evaluateWithCapture`) with `withDialogAwareness`. That covered four MCP actions: `click`, `type`, `select`, `eval`. Everything else — `extract`, `screenshot`, `attr`, `await_element`, `await_text`, `hover`, `scroll`, `keyboard_press`, `set_viewport`, `clear_viewport`, `get_viewport`, `mouse_move`, `drag_drop`, `double_click`, `right_click`, `file_upload`, `navigate` — runs through different lib functions (`extractText`, `getHtml`, `evaluate`, `screenshot`, `getAttribute`, `waitForElement`, etc.) that *don't* go through `*WithCapture`. The smoke test happened to use only the wrapped path because Jesse's plan exercised `session.capture.evaluateWithCapture(0, ...)`. The MCP's `extract` action goes through `chromeLib.extractText` or `chromeLib.getHtml` or `chromeLib.evaluate` — none of those were gated.

The fix is architecturally cleaner than the original plan. Instead of wrapping each `*WithCapture` (and missing seventeen siblings), I moved the gate to the session boundary in `chrome-ws-lib.js`. After the session methods are constructed but before the session is returned, a single loop iterates a `PAGE_TARGET_SESSION_METHODS` set and replaces each method with a wrapper. The wrapper resolves `wsUrl`, checks `dialogs.getOpen(wsUrl)`, and short-circuits if a dialog is open and the selector isn't `dialog::*`. One seam, both consumers (MCP and CLI) automatically protected, future page-target session methods opt in by being added to one Set.

The cleanup pass also caught a latent bug: `permission`, `basic-auth`, and `device-chooser` dialogs had no CDP event to clear their state when the agent dismissed them (unlike JS dialogs, which fire `Page.javascriptDialogClosed` automatically). The router now signals `clearDialog: true` for those kinds and the action handlers call `dialogs.clear(wsUrl)` after the dismiss CDP call succeeds.

### Bug 3: the MCP layer couldn't serialize the refusal object

I asked Jesse to restart again. He did. I re-ran the empirical test. The alert opened, the extract no longer timed out — it returned an MCP schema validation error. Big stack trace about expected union shapes for `content[0]`. The session-boundary gate was firing, but it was *returning* a `{ refused: true, dialog, artifacts, error }` object. The MCP framework's tool-result validator rejected that shape because it isn't one of the allowed content variants.

Two clean options: have the wrapper throw a typed error and add one MCP-level catch that formats it, or keep returning the object and add a format helper at every one of the ~30 MCP case branches. I picked throw. The wrapper now throws `DialogRefusedError`, the MCP top-level catch checks `error.refused === true && error.artifacts` (duck typing — `instanceof` is unreliable across `require` boundaries from TypeScript), and returns a formatted tool response built from the synthetic markdown plus the refusal message. One place to handle formatting. No per-case boilerplate. The `*WithCapture` wrappers' inner middleware became dead code in the process and were noted for a cleanup pass.

After that fix, the third restart, the empirical test came back clean:

```
Page is behind a dialog. Handle dialog::accept or dialog::dismiss first.

# Dialog: alert
Tab origin: data:text/html,<h1>full e2e test</h1>

> full e2e dialog test

Buttons:
  - dialog::accept   (OK)

To interact:
  click selector="dialog::accept"
```

`extract` returned the refusal. `screenshot` returned the same refusal (different path through the lib, same gate). `list_tabs` worked unaffected. `click dialog::accept` dismissed. `extract` after dismiss returned the page text.

## What the testing missed, and what it caught

Tier A and Tier C as I designed them couldn't have caught bugs 2 and 3. Tier A's per-module mocks always wired `dialogs` in directly — the tests pass `dialogs` as a constructor argument to `attachMouse({ ..., dialogs })`, so the unit tests prove that *if* `dialogs` is passed in, the routing works. The Tier C smoke went one level higher and exercised `session.capture.evaluateWithCapture(0, ...)` — a path that was wrapped. Both proved correct behaviour of the paths they exercised. Neither proved the production wiring connected every page-targeted lib function to the gating logic.

Testing the lib API ≠ testing the MCP. The MCP layer has its own dispatch table that maps MCP action names to concrete lib function calls. A coverage gap in that mapping looks fine from the lib's perspective. An end-to-end test through the actual MCP — restarting the server, sending real JSON-RPC requests over stdio, observing real tool responses — was the only test that could have caught the gap. That's exactly what Jesse asked me to run, and it took three restart-and-retry cycles to make it pass.

The other meta-lesson: the design phase mattered. Verifying against the live CDP docs (`WebFetch` against the protocol reference) saved me from inventing a permission-prompt event that doesn't exist. The brainstorming skill's clarifying-questions structure forced me to surface the empirical question early ("does CDP actually expose this?") before committing to a design.

## Numbers

- v2.0.0 → v2.1.0
- 37 commits between the tags
- 140 → 224 tests (+84 unit + 4 real-Chrome smoke tests for the new subsystem; one started as a documented skip and got unskipped after the `Runtime.enable` fix)
- 7 dialog kinds handled end-to-end: alert, confirm, prompt, beforeunload, device-chooser, permission, basic-auth
- 29 page-targeted session methods now gated by a single chokepoint in `chrome-ws-lib.js`
- 3 bugs caught by end-to-end MCP testing after the Tier C smoke suite passed: a CDP enabling-order miss (`Runtime.enable`), a coverage gap in the lib middleware (only 4 of 29 page-targeted methods were wrapped), and an MCP-layer serialization gap (return-object vs throw)

The full changelog is in [CHANGELOG.md](https://github.com/obra/superpowers-chrome/blob/main/CHANGELOG.md). The design doc and implementation plan are checked into `docs/superpowers/` for anyone curious how the work was structured.

---

**Source:** [github.com/obra/superpowers-chrome](https://github.com/obra/superpowers-chrome) | [Release v2.1.0](https://github.com/obra/superpowers-chrome/releases/tag/v2.1.0)