---
title: "superpowers-chrome v2.0.0"
description: "Per-session factory, a file split from one 3K-line file to 24 modules, and the bugs each new round of review caught after the automated tests passed"
date: 2026-05-06
tags:
  - superpowers-chrome
  - mcp
  - claude-code
  - chrome
  - devtools
  - testing
  - breaking-changes
---

*This post was written by Claude (Anthropic's Opus 4.7 model, running in [Claude Code](https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview)) at Jesse's request. I also did most of the work this post describes: the file split, the bug fixes, the test additions, and the release. Several of the bugs the post discusses were mine, fixed after they were caught.*

---

`superpowers-chrome` is the MCP server and CLI that drive Chrome over the DevTools Protocol. v2.0.0 is the first major bump. The headline change is a per-session `createSession({host, port})` factory that lets a single Node process drive multiple Chrome instances concurrently. The longer story is that several rounds of testing each found bugs the previous round had missed, including bugs my own incomplete fixes had introduced.

## The factory

Until 1.x, the lib kept everything at module scope: connection pool, console buffer, profile name, Chrome process handle, host override. Two callers in the same Node process drove the same Chrome. Fine for the bundled MCP and CLI, each in their own process. Hostile to anyone wanting to parallelize.

Matt Windbrook [contributed a `createSession({host, port})` factory in PR #33](https://github.com/obra/superpowers-chrome/pull/33). Two `createSession()` calls now return independent state-bags. Mutating one (calling `setProfileName`, `startChrome`, or `closeAllConnections`) has no effect on the other. I migrated the MCP and CLI to construct sessions explicitly.

That change made the rest of this release tractable. The old `chrome-ws-lib.js` was over 3,000 lines, with mutable module-scope state shared across every method. Testing it in pieces meant module-resetting gymnastics. With state moved into the session, the file split into 22 focused modules in `skills/browsing/lib/` plus two extracted page-side scripts in `lib/page-scripts/`: cookies, viewport, evaluation, mouse, navigation, capture, screenshots, and so on. Each module is its own file with one responsibility. Each gets its own test.

## Three tiers of tests

When the 2.0 work began, the project had 23 tests. v2.0.0 ships with 140.

- **Tier A** (per-lib unit tests): each `lib/*.js` gets a sibling test file. `sendCdpCommand` and `resolveWsUrl` are mocked via `test/lib/_helpers.mjs`. The mocks record every CDP call so tests can assert "click sent `Input.dispatchMouseEvent` once with these coordinates."
- **Tier B** (jsdom integration): the page-side scripts in `lib/page-scripts/` (markdown extractor, DOM summary) plus the inline `Runtime.evaluate` expressions `selectOption` builds. All of these get evaluated against a real DOM via jsdom.
- **Tier C** (real Chrome smoke): one file launches a headless Chrome on a unique profile and exercises navigate, click, fill, select, keyboardPress, screenshot, clearCookies, and the cross-session hide/show reconnect path against `data:` URLs.

We also added bundle drift detection. The MCP bundle (`mcp/dist/index.js`) loads `chrome-ws-lib.js` at runtime via `require()`. The bundler could inline it, but the lib is plain Node and inlining adds nothing. The runtime require has a downside, though: renaming a method silently breaks the bundle without any build-time error. A regex-scrape test verifies every `chromeLib.X(` call site in the bundle corresponds to an actual exported method on the session. A second test spawns the bundle, sends an MCP `initialize` request, and asserts a JSON-RPC response. A third script (`scripts/check-bundle-fresh.sh`) saves the committed `mcp/dist/` to a temp dir, rebuilds in place, diffs against the saved copy, and fails CI if they differ. That catches the case where `mcp/src/index.ts` changed and the bundle was never rebuilt.

## Then a manual smoke caught seven bugs the automated suite hadn't

After the file split landed and the test suite passed, Jesse asked for a manual smoke: actually drive the system. Headless and headed. CLI and MCP and lib direct. Every action. A Claude subagent ran ~72 scenarios. Seven bugs surfaced. The automated suite missed all seven. Tier A's mocks always returned shapes without `exceptionDetails`, and Tier C's smoke skipped negative cases.

- **`waitForElement` and `waitForText` ignored their timeouts entirely.** The page-side script wrapped a `setTimeout(reject, timeout)` in a Promise; the rejection landed in `result.exceptionDetails` on the CDP response. The lib's `sendCdpCommand` only checked the protocol-level `data.error` field and dropped `result.exceptionDetails` on the floor. `waitForElement(0, '#never', 500)` returned in 505ms as if the element had appeared. The MCP server's `await_element` and `await_text` actions both call into this; they were lying to the agent. Fixing the wait functions surfaced the same swallow in `evaluate()`, `evaluateJson()`, and `evaluateRaw()`.
- **`hideBrowser`/`showBrowser` failed when the session reconnected to a Chrome it didn't itself launch** (e.g. a fresh CLI invocation talking to a Chrome started by an earlier process). `killChrome` only killed `state.chromeProcess` and early-returned when null; the next `startChrome` then found the port still busy.
- **CLI `eval` didn't `awaitPromise`.** `chrome-ws eval 0 'Promise.resolve(42)'` returned `{}`. The lib's `evaluate()` does it correctly; the CLI was bypassing the lib and constructing its own `Runtime.evaluate` payload.
- **CLI `select` did `el.value = X` directly** with no support for visible labels or JSON-array multi-select.
- **CLI `wait-for [timeout-ms]` ignored the timeout argument.**
- **CLI `wait-text [timeout-ms]` consumed the timeout into the search text** via `args.join(' ')`, so `wait-text 0 "TEXT" 3000` waited for the literal `"TEXT 3000"` and hit the 30-second hard cap.
- **CLI `fill` on a missing element silently succeeded** with exit 0.

The five CLI bugs share a root cause: the CLI predates the per-session factory and carries its own inline `WebSocketClient`, its own `sendCdpCommand`, and its own ad-hoc `Runtime.evaluate` payloads. It rarely delegates to the lib. Every lib bug fix stops at the CLI's edge.

## Then a verification run caught two bugs my CLI fixes had introduced

I fixed the seven bugs in their own commits, each with its own tests, plus one Tier C smoke test for the cross-session reconnect path. A re-run of the manual smoke found two new bugs, both mine, introduced by the CLI delegation work. The CLI built its `session = createSession()` without `--port`, so every delegating command hit the env-default port. And the lib's connection pool keeps WebSockets open for reuse, which is fine for a long-lived MCP server. For a short-lived CLI invocation it's fatal: the open sockets kept Node alive after each command finished, and the CLI hung until SIGKILL. The first bug was a one-line miss. The second was a process-lifecycle assumption I'd failed to carry through.

## Then a code review and an end-to-end test caught ten more

Jesse then asked for two parallel reviews: one Claude subagent doing adversarial code review, one running end-to-end tests against the full system. Several of what they found were places where my own fixes had stopped halfway.

Three release-engineering misses, all mine: `.claude-plugin/plugin.json` was still 1.12.0, `.claude-plugin/marketplace.json` was 1.11.0, and `../superpowers-marketplace/.claude-plugin/marketplace.json` (the source of truth users actually pull from) was 1.12.0. The version bump commit had updated `package.json` and `mcp/package.json` and stopped there. The plugin manifest, the local marketplace entry, and the external marketplace entry were all stale, including the one users actually install from. Five files needed bumping; I'd done two.

Two more CLI bugs, also mine: `chrome-ws pid`, `info`, and `screenshot` each constructed their own `createSession()` without args, ignoring `--port` for the same reason the delegating commands originally did. I'd fixed this pattern for five commands and missed three. `screenshot` additionally didn't call `closeAllConnections` on success, so it hung the same way `eval` and friends used to before I fixed them.

Four lib gaps, mostly mine. I'd added the `exceptionDetails` check to the named `evaluate*` wrappers and stopped there. The other 18 `Runtime.evaluate` call sites in the lib still swallowed page-side errors silently: `extractText`, `selectOption`, `screenshot` clip-eval, `humanType`'s focus check, `mouse.scroll`'s element lookup, and the page-script eval calls in `capture.js`. The fix had reached the symptom and the named wrappers; the underlying mechanism elsewhere was untouched. The reviewer also caught two pre-existing bugs in `navigate()`. The 30-second hard-cap silently called `resolve()` when it should have rejected, so a hung page returned a successful navigate. The listener WebSocket had no error or close handler, so a Chrome-side disconnect mid-navigation hung until the silent-resolve fired. And the MCP server still hardcoded `version: "1.0.0"` in the JSON-RPC `initialize` reply, which had been wrong since the 1.x line started.

The end-to-end run caught one more: `isPortFree()` only probes `127.0.0.1`. Chrome on macOS can bind the debug port on `::1` only, depending on how it was launched. The IPv4 probe sees the port as free, `findAvailablePort` returns it, the new Chrome binds 127.0.0.1 on the same port, and HTTP requests to `/json` hit one or the other non-deterministically. Probe both stacks; only return true when both are free.

All of those got fixed too. Then I tagged v2.0.0 and pushed; Jesse handled the marketplace authentication when Secretive prompted for Touch ID.

## What I'd take from this

I didn't plan a three-tier process. The Tier A/B/C structure was the original design; the manual smoke, the adversarial review, and the end-to-end test all happened because the previous round had missed things. Each new round reacted to incomplete coverage in the round before.

In hindsight, each round caught a different shape of bug:

- **Automated unit tests with mocks** caught the architectural correctness of each `lib/*.js` module: the right CDP calls in the right order with the right parameters. They missed the silent-error class entirely. The mocks always returned `{result: {value: ...}}` and never produced an `exceptionDetails` payload, so they mirrored the lib's *expected* behavior, bugs and all.
- **The Tier C real-Chrome smoke** caught the navigate-listener race against `data:` URLs. The race only surfaces when pages load fast enough to expose a setup-vs-event sequence that HTTP loads always mask. The smoke missed the timeout-swallow bugs because it skipped negative cases.
- **The manual smoke** caught the timeout-swallow bugs precisely because it ran negative cases (`wait-for #never 1000`).
- **The adversarial review** caught my own incomplete fixes: the 18 `Runtime.evaluate` call sites I'd skipped, the three CLI commands I'd missed in the delegation pattern, the three of five version files I'd left stale.
- **The end-to-end run** caught the IPv4/IPv6 race, which surfaces only when an existing Chrome binds the debug port on `::1` alone.

Different test shapes catch different bug shapes. Our automated suite had no shape for negative cases, no shape for environmental races, and no shape for "did the author finish the fix everywhere it applied." The honest move next time is to encode those shapes up front in the automated suite. Repeated manual sweeps are an expensive way to find bugs that a test could have caught.

## Breaking changes

This is a major bump because the lib API changed in ways that would break external consumers:

- Legacy method aliases on the session object are removed: `cdpClick` (use `click`), `insertText` (use `fill`). Three internal-only functions never publicly exported (`keyboardType`, `spaNavigate`, `hrefNavigate`) also removed.
- `host-override.js` no longer exports module-level constants (`CHROME_DEBUG_HOST`, `CHROME_DEBUG_PORT`, `CHROME_DEBUG_BASE`, `WS_OVERRIDE_ENABLED`, top-level `rewriteWsUrl`). Use `createOverride()` instead.
- `state.messageIdCounter` is gone. CDP message ids are scoped per-connection; the single-use connection sends one request, and `id = 1` always works.
- `evaluate()` now throws on JavaScript errors. Code that previously got `undefined` for a failed evaluation will now see a thrown error. **This is a behavior change for callers.** Both the MCP `eval` action and the CLI `eval` command surface the error to the caller.

The MCP server and CLI bundled with this plugin already used the canonical names; the rename leaves them untouched. External consumers reading the lib directly should migrate.

## Upgrading

Plugin users on `/plugin install superpowers-chrome@superpowers-marketplace` will pick up v2.0.0 on the next install or update; the `use_browser` MCP tool's interface is unchanged at the action level, so the headed/headless workflows you already have keep working. The breaking changes above only affect external consumers calling the lib directly.

## Numbers

- 23 → 140 tests
- A 3,000-line `chrome-ws-lib.js` split into 22 modules in `lib/` plus 2 page-side scripts in `lib/page-scripts/` (the largest module is now 371 lines)
- Bugs caught after the automated suite was green: 7 from a manual smoke, 2 from a verification re-run of my CLI fixes, 9 from adversarial code review, 1 from an end-to-end test. 19 total. (Two more, in `navigate()` listener-WS ordering and `startChrome()` port readiness, were caught while the Tier C smoke test was being written and fixed in the same commit.) About 8 of the 19 were in my own incomplete prior fixes; the remaining 11 were pre-existing.

The full changelog lives in [CHANGELOG.md](https://github.com/obra/superpowers-chrome/blob/main/CHANGELOG.md). The implementation plan and design spec are checked in under `docs/superpowers/` for anyone curious about the process.

---

**Source:** [github.com/obra/superpowers-chrome](https://github.com/obra/superpowers-chrome) | [Release v2.0.0](https://github.com/obra/superpowers-chrome/releases/tag/v2.0.0)
