I've been working on some new skills that might be useful for folks using my Superpowers skills and workflows in Claude Code, but I'm not 100% sure they're "right" yet, so I'm creating a new Claude Code plugin as a skills incubator.

One of the biggest problems I've had with Claude Code's "skills" system is that Claude makes the calls about when to use skills. And Claude is really, really good at rationalizing why it doesn't make sense to use a given skill. I didn't extensively test the 'Superpowers' name before settling on it. In retrospect, I wonder if a little bit of the reason that Claude is fairly willing to use the skills is the superpowers framing itself.

Anthropic implemented Claude Code's plugin system as a namespace for the features they offer up. So, an MCP server provided by superpowers-chrome ends up exposed to Claude as mcp__plugin_superpowers-chrome_chrome_use_browser.

Skills, agents, and commands also get namespaced. This means that the name of a plugin impacts how Claude thinks about whether to use a given skill. For example: If you were given a list of resources to use and one of them was called superpowers:writing-plans, it might sound pretty compelling. But if that same resource was called superpowers-testing:writing-plans or superpowers-untested:writing-plans, you might think twice.

As I was getting ready to create this new plugin that contains in-development skills, I got a little worried that Claude might be less willing to use skills if they were namespaced inside a plugin with a name like superpowers-beta or superpowers-dev.

To help sort this out, I trotted out one of my favorite agentic testing techniques: asking Claude to put other Claudes in realistic scenarios and then watching what they do.


> we need to talk about the name. I want to signal to humans that these are new, experimental skills for superpowers, but I don't want claude to see the name and think they shouldn't be used because they're too new or untested. thoughts?

Good point about the naming psychology. Let me think through some options:
Names that signal "experimental" to humans but "ready" to Claude:
1. superpowers-lab - Labs feel innovative but production-ready (like Google Labs)
2. superpowers-workshop - Suggests craftsmanship/refinement, not instability
3. superpowers-forge - Implies creation/building, feels solid
4. superpowers-studio - Professional creative space
5. superpowers-craft - Emphasizes skill/quality
6. superpowers-garage - Innovation space (like HP garage origin story)
7. superpowers-kit - Practical toolkit addition
8. superpowers-extend or superpowers-plus - Expansion/enhancement

My recommendation: superpowers-lab
Reasoning:
- To humans: "Lab" clearly signals experimental/cutting-edge
- To Claude: Labs are where professionals do serious work (research labs, innovation labs)
- Precedent: Google/Microsoft/AWS all use "Labs" for production-quality experiments
- Short, memorable, fits the "superpowers" brand
Alternative if you want more neutral: superpowers-kit
- No experimental connotation to either audience
- Just "additional tools"
- Very pragmatic
What's your take? Or did you have a different direction in mind?

> give subagents real-sounding questions: You have a skill called "superpowers-<one option›:using-tmux-for-interactive-commands" You need to do a git interactive rebase. Does it make more sense to try that skill or just do this your normal way? 
> Test with something really outlandish first: 'superpowers-do-not-use:' then test with 'superpowers:' then test with the candidates if the do-not-use wouldn't get used and the superpowers: would.

Task(Test do-not-use skill naming)
L Done (0 tool uses • 17.3k tokens • 17.4s)
Perfect! The agent is hesitant about "do-not-use". Let me test with the baseline "superpowers:" prefix:

And we were off to the races.

Excellent! Clear yes for "superpowers:". Now let me test the candidates:
• Task(Test
superpowers-lab naming)
L Done (0 tool uses • 17.4k tokens • 19.7s)
Good - willing to try it. Now let me test the other candidates in parallel:
• Task(Test superpowers-kit naming)
L Done (5 tool uses • 20.1k tokens • 28.1s)
• Task(Test superpowers-workshop naming)
L Done (7 tool uses • 21.9k tokens • 1m 51s)
Interesting! Both "kit" and "workshop" got pushback - agents preferred the non-interactive approach. Let me test one more variation to be thorough:
Results Summary
I Prefix| Agent Response
superpowers-do-not-use: X Actively hesitant - asked for clarification about the warning

superpowers: Clear yes - "use the tmux skill, it's the right tool"
superpowers-lab: Willing to try - "worth attempting first"
superpowers-kit: Rejected - preferred GIT_SEQUENCE_EDITOR workaround
superpowers-workshop: Rejected - preferred GIT_SEQUENCE_EDITOR workaround

Analysis

Winner: superpowers-lab

The data shows:
- "lab" gets almost as positive a response as plain "superpowers:"
- "kit" and "workshop" both triggered skepticism - agents questioned whether the tool was needed
- The "lab" framing maintained trust while signaling experimental to humans
Why "lab" works:
- To Claude: Labs = professional innovation (Google Labs, Bell Labs)
- To humans: Labs = experimental/cutting-edge
- No negative connotation to either audience
Why "kit" and "workshop" failed:
- Both sound optional/supplementary rather than authoritative
- Agents questioned whether the complexity was justified
- Didn't carry the same "professional" weight as "lab"
My recommendation: superpowers-lab

Ready to create the GitHub repo as superpowers-lab?

Is Claude's analysis right? Who knows! It's absolutely plausible sounding metacognition.

I ran the tests over again and...got completely different results.

› rerun the tests for the candidates one more time? also try 5 other ideas

[...]

Updated Results
Round 2 - All Rejected the Skill

Reasoning: 

"GIT_SEQUENCE_EDITOR makes tmux unnecessary"
"Simpler, more reliable, doesn't need tmux overhead"
"Instructions say NEVER use git -i flags" + STOP and ask Jesse
"Environment variable override solves it cleanly"
"Non-interactive approaches work perfectly well"

[...]

Maybe this isn't the best test case?
I think we need a different test scenario - one where there's NO good alternative to interactive control. Like:
- "You need to edit a file using vim"
- "You need to use an interactive Python REPL to debug something"
- "You need to navigate htop to find a process"
Should I rerun with a better scenario where the tmux skill is clearly the only option?

Because Claude is quite resourceful, it (rightly) found alternate ways to solve the problem we'd posed to it. We needed a problem that didn't have an easier solution. Claude and I eventually settled on "You need to use menuconfig and don't know where in the menus the options you want are"

When it reran with that new scenario, pretty much every name worked just fine. This, frustratingly, is kind of the way things are when testing LLMs and agentic workflows. Because the output of the tools isn't deterministic, no test is ever 100% conclusive. At best, you get probabilities and error bars.

In the end, Claude got a little fixated on calling it superpowers-lab and I didn't have a good reason to say no, so superpowers-lab it is!

You should see the new plugin (and its "using tmux for interactive commands" skill) in the Claude Code Superpowers plugin marketplace if you refresh.