---
type: post
title: Superpowers 6
date: 2026-06-15
tags: [superpowers, agents, coding, claude, codex, fable, optimization]
---

You can also read this post on our corporate blog at [https://primeradiant.com/blog](https://primeradiant.com/blog)

TL;DR: Superpowers 6 is much, much faster and burns many fewer tokens to get the same high-quality outcomes. If you're tokenmaxxing, maybe skip this  release, but if you care about your builds being up to 50% faster and up to 60% cheaper, you're going to love Superpowers 6.

A week ago, we were gearing up to release Superpowers 5.2. We'd slipped the release a couple of times already to add "just one more improvement."

We added support for Pi, Antigravity and Kimi Code. 

We made Superpowers work better on Codex and OpenCode, and Cursor.

We rewrote a bunch of the Superpowers skills to be model and harness agnostic, which helps them be more reliable everywhere. We also wrote a new contribution guide for how to add support for a new coding agent harness for Superpowers.

We did a bunch of work to make Visual Brainstorming easier to use, safer, and more reliable.

And we fixed a whole slew of bugs, including a particularly nasty one that led to code review subagents sometimes reviewing the whole branch, rather than a single task.

It was going to be a great release.

And then Anthropic shipped (and unshipped) Fable. In the few days that I had access to Fable, I put it to the best use that I could.

It's no secret that the most common lament we hear from Superpowers users is that tokens are expensive and Superpowers uses a ton of them. Building software with Superpowers is slower than building without it, too. The "slow" part shouldn't matter - it happens during the autonomous subagent driven development orchestration of the build process. 

But it does matter. Slow isn't fun. And expensive isn't fun either.

A bunch of the reasons that Superpowers builds have taken longer and cost more are the same reasons that it delivers good outcomes for so many users. It does a ton of up-front planning work to make sure your implementations can be hands-off, forces strict red-green TDD while implementing, and then the orchestrator inside Superpowers reviews every single change on two axes: 
1) did the agent implement exactly what was asked, no more and no less.
2) is the quality of the work up to snuff.

Just by the nature of what it's doing, it's going to be slower than yoloing an untested implementation and calling it a day. 

But it's never made me *happy* that it's slow and expensive.

When Fable came out, I decided to see how well it could optimize Subagent Driven Development. 

I think I was hoping for something like a 15% reduction in token spend. 

I got that. And a whole lot more.

Our first angle of attack was looking at the coordinator to reviewer handoff. Fable analyzed thousands of Subagent Driven Development sessions and found that code and spec-compliance review subagents sometimes ran a *lot* of git commands while doing their reviews. Simply switching the written instructions for how to find the commits to review to a shell script that pre-generates a review package containing well-formatted diff and some other metadata decreased token spend and wall-clock time by about 10%.

As I was going to bed that evening, I told Fable to see about shaving another 15% off wall clock time and token cost for our evals while I slept. 

As I was going to bed, I posted a note on our internal Slack that we should look at evaluating what happens if you combine the code reviewer and the spec compliance reviewer.

I don't really know what I expected to happen overnight, but I don't think it was waking up to find that Fable had independently come to the same conclusion, tested it, and found that across our eval suite, it saved that additional 15% I'd asked for.


![pasted image 20260615 200048](/assets/2026/06/pasted-image-20260615-200048.png)

The next night, I got a little more ambitious.

```
/goal once this is done, run an autoresearch loop to improve cost-efficiency of the superpowers build loop. test with opus as the coordinator. make an hypothesis log. run experiments. run at least 25 experiments.

```

Fable built out a full autoresearch harness and ran overnight. You can check out what it built [on GitHub](https://github.com/prime-radiant-inc/superpowers-autoresearch).

```
**The ==autoresearch== loop is complete: 25 experiments closed** (plus 4 logged as backlog), opus as coordinator throughout, every hypothesis pre-registered with a prediction in the log, now durable at evals `docs/experiments/2026-06-11-build-loop-==autoresearch==.md` (a76ddc9). Loop spend ≈ $165.

 What the loop found

**The ship candidate (E27):** opus controller + elicited plan + conditional haiku implementers + terse reviewer contract + narration recipe + final-review tier pin → **fractals $6.24/$6.60** (the combo config on hand plans was $11.67–14.84). Planted-defect gates 2/3, with the one fail forensically attributed to reviewer variance plus judge strictness — the terse contract was explicitly exonerated by reading the reviewer's reports. The N=5 gate battery is still owed before any of it ships as skill text.

**Wins with numbers:** terse reviewer contract (−41% reviewer output, verdicts intact), narration recipe (−54%, zero variance), conditional implementer tiering (~$0.5–1/run, and E22 proved it correctly _refuses_ haiku on prose plans), fixture-realism replicated on svelte (−24% scope-matched).

**Things now provably dead:** capping controller thinking **backfires** — turns rose 92→138 and output doubled; thinking buys turn efficiency. Plan word budgets slash test content −62% even when code is exempted. Sonnet plan generation keeps fidelity but collapses task structure. Implementation bodies in plans are marginal — tests + interfaces + structure carry the whole load (E28).

**Risk finding worth remembering:** reviewers given only the diff package produce _confident_ spec verdicts that silently redefine "spec" as the global constraints — 0/5 flagged the missing brief. Same failure family as the haiku-reviewer advocacy.

**Six leads closed as already-optimal** (report reads, cache health, reviewer floor, haiku fixers, todo bookkeeping, dispatch re-derivation) — recorded so nobody re-buys them.

Three of my own measurement bugs were caught and corrected mid-loop by the manual-inspection discipline (a grep counting template echoes as self-review catches, a harness that never inlined the diff, a scorer regex missing newlines) — one retracted verdict was re-measured clean (−74% became the honest −41%).
```

The long and the short of it it is that across about 36 hours of work and what would have been $650 of unsubsidized token spend, our Anthropic eval benchmarks were looking like we'd reduced wall-clock runtime for Superpowers builds by 50% and token spend by 60%.

And then we ran our evals against Codex. The results were not good. I'd worried that they might not show the same level of improvement, but they showed *no* improvement.

A few minutes of digging and we found the culprit. On Codex, the evals weren't yet sufficiently isolated from the host OS....so we were always benchmarking Superpowers 5.1.0.

A little bit of fiddling later....yup. Everything held up.


![pasted image 20260615 195959](/assets/2026/06/pasted-image-20260615-195959.png)

The biggest improvements came from combining the spec compliance and code quality review agents, pre-baking the review "packet" handed to the reviewers so they rarely need to run git, and changing the guidance we give to the orchestrator about what kind of agent you need for a given task.

We've been working hard on our evals suite for Superpowers and without it, it would have been impossible to measure and test the changes we made. The suite is still relatively young, but it has meant that we're able to make and test changes to Superpowers across a variety of supported harnesses and to quantify what those changes do across a growing set of coding agents. You can find it at https://github.com/prime-radiant-inc/superpowers-evals 

We're very proud of the improvements that we (and our robot buddies) have made in Superpowers 6. We think you're going to love the new version.

You can install it right now from [https://github.com/obra/superpowers](https://github.com/obra/superpowers). It'll start to percolate into the first party plugin marketplaces over the next couple of days.

PS: We're hiring! If you know someone who should be working on Superpowers full time, please share the posting with them: [https://primeradiant.com/jobs/superpowers-community-engineer/](https://primeradiant.com/jobs/superpowers-community-engineer/)
