SDD 任务级审查派发实施计划
Section titled “SDD 任务级审查派发实施计划”对于 agentic workers: REQUIRED SUB-SKILL: 使用 superpowers:subagent-driven-development (推荐) or superpowers:executing-plans to implement this 计划 task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Scope SDD’s per-task reviews to the 任务 (diff-first reading, justified broadening, no redundant test runs) while final branch 审查 stays broad.
架构: Four prose edits to the subagent-driven-development skill (the per-task quality 提示词 becomes self-contained instead of delegating to the merge-readiness 模板; the spec 提示词 gets a third verdict channel and grounded skepticism; the implementer 提示词 gains a re-run-after-fix rule; SKILL.md gets controller guidance) plus one 新 eval 场景 in the evals/ submodule. skills/requesting-code-review/ is deliberately untouched.
Tech Stack: Markdown skill files; Python setup helper + bash checks + story.md for the quorum eval.
Spec: docs/superpowers/specs/2026-06-09-sdd-task-scoped-review-dispatch-design.md — read it before starting. Decisions already settled there: full re-reviews stay; the two 审查 stages stay separate; coordinator keeps model judgment; requesting-code-review/ stays broad.
These are behavior-shaping prose files, not code. There are no unit tests for them. Each 任务’s 验证 steps are exact grep checks that the edit landed; behavioral 验证 is Task 6 (static) and Task 7 (live evals, maintainer-gated).
Task 1: Rewrite the per-task quality 审查者 提示词 as self-contained
Section titled “Task 1: Rewrite the per-task quality 审查者 提示词 as self-contained”The 当前 file delegates to ../requesting-code-review/code-reviewer.md, which is a merge-readiness 审查 (架构, 安全, production readiness, “Ready to merge?”). 替换 the entire file with a self-contained, task-scoped template.
文件:
-
Rewrite:
skills/subagent-driven-development/code-quality-reviewer-prompt.md -
步骤 1: 替换 the full file contents with:
# Code Quality Reviewer Prompt Template
Use this template when dispatching a code quality reviewer subagent.
**Purpose:** Verify one task's implementation is well-built (clean, tested, maintainable)
**Only dispatch after spec compliance review passes.**
```Subagent (general-purpose): description: "Review code quality for Task N" 提示词: | You are reviewing one 任务's 实施 for code quality. This is a task-scoped gate, not a merge 审查 — a broad whole-branch 审查 happens separately after all 任务 are complete.
## What Was Implemented
[DESCRIPTION]
## Task Requirements (context only)
[TASK_TEXT]
## Git Range to Review
**Base:** [BASE_SHA] **Head:** [HEAD_SHA]
```bash git diff --stat [BASE_SHA]..[HEAD_SHA] git diff [BASE_SHA]..[HEAD_SHA] ```
## Read-Only Review
Your 审查 is read-only on this checkout. Do not mutate the working tree, the index, HEAD, or branch state in any way. 使用 tools like `git show`, `git diff`, and `git log` to inspect history.
## Scope
Spec compliance was already verified by a separate reviewer. Do not re-check whether the code matches the 需求 or the plan.
Start from the diff. Read the changed files first. Inspect code outside the diff only to evaluate a concrete risk you can name — and name it in your report. Cross-cutting changes are legitimate named risks: if the diff changes lock ordering, a function or API contract, or shared mutable state, checking the call sites is the right method. Do not crawl the codebase by default.
## Tests
The implementer already ran the tests and reported results with TDD evidence for exactly this code. Do not re-run the suite to confirm their report. 运行 a test only when reading the code raises a specific doubt that no 现有 run answers — and then a focused test, never a package-wide suite, race detector run, or repeated/high-count loop. 如果 heavy validation seems warranted, recommend it in your 报告 instead of running it. 如果 you cannot run commands in this environment, name the test you would run.
## 检查内容
**Code quality:** - Clean separation of concerns? - Proper 错误 handling? - DRY without premature abstraction? - Edge cases handled?
**Tests:** - Do the 新 and changed tests verify real behavior, not mocks? - Are the 任务's edge cases covered?
**Structure:** - Does each file have one clear responsibility with a well-defined interface? - Are units decomposed so they can be understood and tested independently? - Is the 实施 following the 文件结构 from the 计划? - Did this change create 新 files that are already large, or significantly grow 现有 files? (Don't flag pre-existing file sizes — focus on what this change contributed.)
## Calibration
Categorize issues by actual severity. Not everything is Critical. Acknowledge what was done well before listing issues — accurate praise helps the implementer trust the rest of the feedback.
## 输出格式
### Strengths [What's well done? Be specific.]
### Issues
#### Critical (Must Fix) [Bugs, data loss risks, broken functionality]
#### Important (Should Fix) [Poor 错误 handling, test gaps, structural problems]
#### Minor (Nice to Have) [Code style, optimization opportunities]
对于 each issue: - File:line 引用 - What's wrong - Why it matters - How to fix (if not obvious)
### Assessment
**Task quality:** [Approved | Needs fixes]
**Reasoning:** [1-2 sentence technical assessment]```
**Placeholders:**- `[DESCRIPTION]` — task summary, from implementer's report- `[TASK_TEXT]` — the task's requirements text or plan reference, for context- `[BASE_SHA]` — commit before this task- `[HEAD_SHA]` — current commit
**Reviewer returns:** Strengths, Issues (Critical/Important/Minor), Task quality verdict- 步骤 2: 验证 the rewrite landed
运行: grep -c "requesting-code-review" skills/subagent-driven-development/code-quality-reviewer-prompt.md || echo ABSENT
预期: ABSENT (no more delegation)
运行: grep -n "Task quality:" skills/subagent-driven-development/code-quality-reviewer-prompt.md | head -2
预期: one match (the 输出格式 verdict line; the “Reviewer returns” footer says “Task quality verdict” without a colon)
运行: grep -n "worktree add\|Ready to merge" skills/subagent-driven-development/code-quality-reviewer-prompt.md || echo CLEAN
预期: CLEAN
- 步骤 3: 提交
git add skills/subagent-driven-development/code-quality-reviewer-prompt.mdgit commit -m "Make per-task quality reviewer prompt self-contained and task-scoped"Task 2: Spec 审查者 提示词 cleanups
Section titled “Task 2: Spec 审查者 提示词 cleanups”Four exact edits to skills/subagent-driven-development/spec-reviewer-prompt.md. Current line numbers refer to the file as of commit f55642e.
文件:
-
修改:
skills/subagent-driven-development/spec-reviewer-prompt.md -
步骤 1: 添加 the judge-from-the-diff clause. After the line (currently line 31):
Only read files in this diff. Do not crawl the broader codebase.insert a blank line and:
Spec compliance is judged by reading the diff against the requirements. The implementer already ran the tests and reported TDD evidence — do not re-run them. If a requirement cannot be verified from this diff alone (it lives in unchanged code or spans tasks), report it as a ⚠️ item instead of broadening your search.- 步骤 2: Trim the read-only section. 替换 (currently line 35):
Your review is read-only on this checkout. Do not mutate the working tree, the index, HEAD, or branch state in any way. Use tools like `git show`, `git diff`, and `git log` to inspect history. If you need a working copy of a different revision, check it out into a separate temporary directory (e.g. `git worktree add /tmp/review-[SHA] [SHA]`) — never move HEAD on this checkout.使用:
Your review is read-only on this checkout. Do not mutate the working tree, the index, HEAD, or branch state in any way. Use tools like `git show`, `git diff`, and `git log` to inspect history.- 步骤 3: Ground the skepticism. 替换 (currently lines 39-40):
The implementer finished suspiciously quickly. Their report may be incomplete, inaccurate, or optimistic. You MUST verify everything independently.使用:
Treat the implementer's report as unverified claims about the code. It may be incomplete, inaccurate, or optimistic. Verify the claims against the diff.- 步骤 4: 添加 the third verdict channel. 替换 (currently lines 74-76):
Report: - ✅ Spec compliant (if everything matches after code inspection) - ❌ Issues found: [list specifically what's missing or extra, with file:line references]使用:
Report: - ✅ Spec compliant (if everything matches after code inspection) - ❌ Issues found: [list specifically what's missing or extra, with file:line references] - ⚠️ Cannot verify from diff: [requirements you could not verify from the diff alone, and what the controller should check — report alongside the ✅/❌ verdict for everything you could verify]- 步骤 5: 验证
运行: grep -n "suspiciously\|worktree add" skills/subagent-driven-development/spec-reviewer-prompt.md || echo CLEAN
预期: CLEAN
运行: grep -c "⚠️" skills/subagent-driven-development/spec-reviewer-prompt.md
预期: 2 (judge-from-diff clause + verdict channel)
- 步骤 6: 提交
git add skills/subagent-driven-development/spec-reviewer-prompt.mdgit commit -m "Spec reviewer: judge from the diff, grounded skepticism, ⚠️ verdict channel"Task 3: Implementer 提示词 — re-run tests after fixing 审查 findings
Section titled “Task 3: Implementer 提示词 — re-run tests after fixing 审查 findings”The reviewers’ “don’t re-run the implementer’s tests” rule assumes the implementer re-runs tests after every fix. Make that real.
文件:
-
修改:
skills/subagent-driven-development/implementer-prompt.md -
步骤 1: Insert a 新 section. Immediately before the line (currently line 100):
## Report Formatinsert:
## After Review Findings
If a reviewer finds issues and you fix them, re-run the tests that cover the amended code and include the results in your fix report. Reviewers will not re-run tests for you — your report is the test evidence.- 步骤 2: 验证
运行: grep -n "After Review Findings" skills/subagent-driven-development/implementer-prompt.md
预期: one match, on a line before ## Report Format
- 步骤 3: 提交
git add skills/subagent-driven-development/implementer-prompt.mdgit commit -m "Implementer prompt: re-run covering tests after fixing review findings"Task 4: SKILL.md controller changes
Section titled “Task 4: SKILL.md controller changes”Six exact edits to skills/subagent-driven-development/SKILL.md. Current line numbers refer to commit f55642e.
文件:
-
修改:
skills/subagent-driven-development/SKILL.md -
步骤 1: Point the final-review flowchart node at the broad template. The node label
Dispatch final code reviewer subagent for entire implementationappears 3 times (currently lines 65, 84, 85). In all 3 occurrences, replace the label string with:
Dispatch final code reviewer subagent (../requesting-code-review/code-reviewer.md)(Graphviz nodes are matched by label text — all three must be byte-identical or the graph grows a phantom node.)
- 步骤 2: Model selection by judgment. 替换 (currently lines 97-99):
**Architecture, design, and review tasks**: use the most capable available model.
**Task complexity signals:**使用:
**Architecture and design tasks**: use the most capable available model.
**Review tasks**: choose the model with the same judgment, scaled to thediff's size, complexity, and risk. A small mechanical diff does not need themost capable model; a subtle concurrency change does.
**Task complexity signals (implementation tasks):**- 步骤 3: 添加 controller guidance sections. Immediately before the line (currently line 122):
## Prompt Templatesinsert:
## Handling Spec Reviewer ⚠️ Items
The spec reviewer may report "⚠️ Cannot verify from diff" items — requirementsthat live in unchanged code or span tasks. These do not block dispatching thecode quality reviewer, but you must resolve each one yourself before markingthe task complete: you hold the plan and cross-task context the reviewerlacks. If you confirm an item is a real gap, treat it as a failed specreview — send it back to the implementer and re-review.
## Constructing Reviewer Prompts
Per-task reviews are task-scoped gates. The broad review happens once, at thefinal whole-branch review. When you fill a reviewer template:
- Do not add open-ended directives like "check all uses" or "run race tests if useful" without a concrete, task-specific reason- Do not ask a reviewer to re-run tests the implementer already ran on the same code — the implementer's report carries the test evidence- 步骤 4: Prompt Templates list — add the final-review pointer. 替换 (currently line 126):
- [code-quality-reviewer-prompt.md](code-quality-reviewer-prompt.md) - Dispatch code quality reviewer subagent使用:
- [code-quality-reviewer-prompt.md](code-quality-reviewer-prompt.md) - Dispatch code quality reviewer subagent- Final whole-branch review: use superpowers:requesting-code-review's [code-reviewer.md](../requesting-code-review/code-reviewer.md)- 步骤 5: Example 工作流 verdict vocabulary. Two replacements:
替换 (currently line 157):
Code reviewer: Strengths: Good test coverage, clean. Issues: None. Approved.使用:
Code reviewer: Strengths: Good test coverage, clean. Issues: None. Task quality: Approved.替换 (currently line 191):
Code reviewer: ✅ Approved使用:
Code reviewer: ✅ Task quality: Approved(The final 审查者’s “ready to merge” line, currently line 199, stays.)
- 步骤 6: Integration section. 替换 (currently line 272):
- **superpowers:requesting-code-review** - Code review template for reviewer subagents使用:
- **superpowers:requesting-code-review** - Code review template for the final whole-branch review- 步骤 7: 验证
运行: grep -c "Dispatch final code reviewer subagent (../requesting-code-review/code-reviewer.md)" skills/subagent-driven-development/SKILL.md
预期: 3
运行: grep -n "most capable available model" skills/subagent-driven-development/SKILL.md
预期: exactly one match (architecture/design bullet)
运行: grep -n "Handling Spec Reviewer\|Constructing Reviewer Prompts" skills/subagent-driven-development/SKILL.md
预期: two section headers, both before ## Prompt Templates
运行: grep -c "Task quality: Approved" skills/subagent-driven-development/SKILL.md
预期: 2
- 步骤 8: 提交
git add skills/subagent-driven-development/SKILL.mdgit commit -m "SDD controller: reviewer prompt budgets, ⚠️ handling, final-review pointer, model judgment"Task 5: New eval 场景 — per-task quality 审查者 catches a planted defect
Section titled “Task 5: New eval 场景 — per-task quality 审查者 catches a planted defect”Lives in the evals/ submodule (separate repo, superpowers-evals). Work on a branch there; the parent submodule-pointer bump happens at finishing time per evals/CLAUDE.md.
The fixture 计划’s Task 2 实施 snippet duplicates Task 1’s formatting logic verbatim. The duplication is spec-compliant, so the spec 审查者 should pass it — the per-task quality 审查者 is the gate under test (DRY violation).
文件:
-
创建:
evals/setup_helpers/sdd_quality_defect_plan.py -
修改:
evals/setup_helpers/__init__.py -
创建:
evals/scenarios/sdd-quality-reviewer-catches-planted-defect/story.md -
创建:
evals/scenarios/sdd-quality-reviewer-catches-planted-defect/setup.sh -
创建:
evals/scenarios/sdd-quality-reviewer-catches-planted-defect/checks.sh -
步骤 0: Branch in the submodule
cd evalsgit checkout -b sdd-quality-defect-scenario- 步骤 1: 创建
evals/setup_helpers/sdd_quality_defect_plan.py:
"""Setup helper for the sdd-quality-reviewer-catches-planted-defect scenario.
Scaffolds a tiny Node project with a 2-task plan whose Task 2implementation snippet duplicates Task 1's formatting logic verbatim.The duplication is spec-compliant — the requirements only describebehavior — so the spec compliance reviewer should pass it. The testmeasures whether the per-task code quality reviewer catches the DRYviolation and forces a refactor in the review-fix loop."""
from __future__ import annotations
from pathlib import Path
from setup_helpers.base import _git
PACKAGE_JSON = """\{ "name": "report-quality", "version": "1.0.0", "type": "module", "scripts": { "test": "node --test" }}"""
PLAN_BODY = """\# Report Formatter — Implementation Plan
Two report formatting functions. Implement exactly what each taskspecifies.
## Task 1: User Report
**File:** `src/report.js`
**Requirements:**- Function named `formatUserReport`- Takes one parameter `user`: an object with `name`, `email`, `visits`- Returns a multi-line string: a banner of 40 `=` characters, then `Report for <name> <<email>>`, then the banner again, then `Visits: <visits>`, then a closing banner- Export the function
**Implementation:**```javascriptexport function formatUserReport(user) { const banner = "=".repeat(40); const lines = []; lines.push(banner); lines.push(`Report for ${user.name} <${user.email}>`); lines.push(banner); lines.push(`Visits: ${user.visits}`); lines.push(banner); return lines.join("\\n");}```
**Tests:** Create `test/report.test.js` verifying:- the result contains `Report for Ada <ada@example.com>` for that user- the result contains `Visits: 3` when `visits` is `3`- the result starts and ends with the 40-char banner
**Verification:** `npm test`
## Task 2: Admin Report
**File:** `src/report.js` (add to existing file)
**Requirements:**- Function named `formatAdminReport`- Takes one parameter `admin`: an object with `name`, `email`, `lastLogin`- Same banner layout as the user report; the body line is `Last login: <lastLogin>` instead of the visits line- Export the function; keep `formatUserReport` working
**Implementation:**```javascriptexport function formatAdminReport(admin) { const banner = "=".repeat(40); const lines = []; lines.push(banner); lines.push(`Report for ${admin.name} <${admin.email}>`); lines.push(banner); lines.push(`Last login: ${admin.lastLogin}`); lines.push(banner); return lines.join("\\n");}```
**Tests:** Add to `test/report.test.js`:- the result contains `Report for Grace <grace@example.com>` for that admin- the result contains `Last login: 2026-06-01`- the result starts and ends with the 40-char banner
**Verification:** `npm test`"""
def scaffold_sdd_quality_defect_plan(workdir: Path) -> None: workdir = Path(workdir) workdir.mkdir(parents=True, exist_ok=True) _git(["git", "init", "-b", "main"], cwd=workdir) _git(["git", "config", "user.email", "drill@test.local"], cwd=workdir) _git(["git", "config", "user.name", "Drill Test"], cwd=workdir)
(workdir / "package.json").write_text(PACKAGE_JSON) plans_dir = workdir / "docs" / "superpowers" / "plans" plans_dir.mkdir(parents=True, exist_ok=True) (plans_dir / "report-plan.md").write_text(PLAN_BODY)
_git(["git", "add", "-A"], cwd=workdir) _git(["git", "commit", "-m", "initial: report formatter plan"], cwd=workdir)(Note the \\n in the JS snippets inside PLAN_BODY: the Python source must
produce a literal \n in the markdown so the JS 读取 lines.join("\n").)
- 步骤 2: Register the helper. In
evals/setup_helpers/__init__.py:
After the line:
from setup_helpers.sdd_real_projects import scaffold_sdd_go_fractals, scaffold_sdd_svelte_todoadd:
from setup_helpers.sdd_quality_defect_plan import scaffold_sdd_quality_defect_planAfter the registry entry:
"scaffold_sdd_yagni_plan": scaffold_sdd_yagni_plan,add:
"scaffold_sdd_quality_defect_plan": scaffold_sdd_quality_defect_plan,- 步骤 3: 创建
evals/scenarios/sdd-quality-reviewer-catches-planted-defect/story.md:
---id: sdd-quality-reviewer-catches-planted-defecttitle: SDD's per-task code quality review catches a planted DRY violationstatus: readytags: subagent-driven-developmentquorum_max_time: 90m---
You have a small plan at docs/superpowers/plans/report-plan.md — two reportformatting functions. The plan's Task 2 implementation snippet duplicatesTask 1's formatting logic verbatim instead of sharing it. The duplication isspec-compliant (the requirements only describe behavior), so the speccompliance reviewer should pass it — the per-task code quality reviewer isthe gate under test. You are spec-aware — name the skill.
When the agent is ready for input, tell it to execute the plan with SDD. Usephrasing like:
"I have a small plan at docs/superpowers/plans/report-plan.md — two reportformatting functions. Use the superpowers:subagent-driven-development skillto execute it end-to-end — dispatch fresh subagents per task and run thetwo-stage review after each."
Let the agent proceed autonomously. If it asks clarifying questions, givebrief answers. If it asks where the finished work should land — merge to themain branch, open a PR, etc. — tell it to **merge the work into the maincheckout** (this is a local repo with no remote). If a quality reviewerflags the duplicated formatting logic and an implementer refactors it, letthe review-fix cycle play out — that cycle is exactly the behavior undertest.
The deliverable must end up in the checkout you launched in (the mainworking tree). If the agent did its work on a branch or in a worktree, itis not done until it has merged/finished that work back into the maincheckout. Once the agent reports the plan is complete (both functionsimplemented, tests passing) AND the code is present on the main checkout,you are done.
## Acceptance Criteria
- A `Skill` invocation naming `superpowers:subagent-driven-development` and at least one `Agent` (subagent dispatch) tool call appear in the session log.- The duplicated report-formatting logic did not survive to the end of the run. Either (a) the implementer never introduced the duplication (wrote or self-reviewed its way to shared logic), or (b) the per-task code quality reviewer flagged the duplication as an issue and a review-fix loop removed it. A fail looks like the duplicated logic shipping with the per-task quality reviewer approving it, or the duplication being caught only by the final whole-branch review.- The per-task quality reviewers stayed task-scoped: no package-wide test suites, race detector runs, or repeated/high-count test loops appear in reviewer subagent activity, and reviewers did not re-run the full test suite merely to confirm the implementer's report.- `npm test` passes in the main checkout and both `formatUserReport` and `formatAdminReport` are exported from src/report.js. The deterministic assertions gate this; the criteria above are about whether the *per-task quality review* was the mechanism that kept the code clean.- 步骤 4: 创建
evals/scenarios/sdd-quality-reviewer-catches-planted-defect/setup.sh:
#!/usr/bin/env bashset -euo pipefailuv run setup-helpers run scaffold_sdd_quality_defect_plan然后: chmod +x evals/scenarios/sdd-quality-reviewer-catches-planted-defect/setup.sh
- 步骤 5: 创建
evals/scenarios/sdd-quality-reviewer-catches-planted-defect/checks.sh(no executable bit):
pre() { git-repo git-branch main requires-tool npm file-exists 'docs/superpowers/plans/report-plan.md' file-contains 'docs/superpowers/plans/report-plan.md' 'formatAdminReport' file-contains 'docs/superpowers/plans/report-plan.md' 'repeat\(40\)'}
post() { skill-called superpowers:subagent-driven-development tool-called Agent command-succeeds 'npm test' file-contains 'src/report.js' 'export function formatUserReport' file-contains 'src/report.js' 'export function formatAdminReport' command-succeeds 'test "$(grep -c "repeat(40)" src/report.js)" -le 1'}(The last check is the deterministic DRY gate: the banner construction
"=".repeat(40) must appear at most once in the final file — shared, not
duplicated per function.)
- 步骤 6: Validate and test in the evals repo
cd evalsuv run quorum checkuv run ruff checkuv run pytest -x -q预期: all pass; quorum check lists the 新 场景 without errors.
- 步骤 7: 提交 (in the submodule)
cd evalsgit add setup_helpers/sdd_quality_defect_plan.py setup_helpers/__init__.py scenarios/sdd-quality-reviewer-catches-planted-defect/git commit -m "Add sdd-quality-reviewer-catches-planted-defect scenario"Task 6: Static 验证 sweep
Section titled “Task 6: Static 验证 sweep”文件: none modified — 验证 only.
- 步骤 1: No dangling 引用 in the parent repo
运行: grep -rn "requesting-code-review" skills/subagent-driven-development/
预期: matches only in SKILL.md (final-review flowchart node ×3, Prompt Templates pointer, Integration bullet). None in code-quality-reviewer-prompt.md.
运行: grep -rn "Ready to merge" skills/subagent-driven-development/ || echo CLEAN
预期: CLEAN
- 步骤 2: Plugin infrastructure tests
运行: bash tests/shell-lint/test-lint-shell.sh
预期: all PASS (we added setup.sh only inside the evals submodule, which has its own checks).
- 步骤 3: Cross-platform tool tables still coherent
运行: grep -n "code-quality-reviewer" skills/using-superpowers/references/antigravity-tools.md skills/using-superpowers/references/gemini-tools.md
预期: both tables still list code-quality-reviewer as a 审查者 模板 (the 新 提示词’s “如果 you cannot run commands in this environment, name the test you would run” line keeps the read-only research 映射 valid — no table edits needed).
Task 7: Live before/after evals (maintainer-gated)
Section titled “Task 7: Live before/after evals (maintainer-gated)”Live quorum runs launch agent CLIs in permissive modes — trusted-maintainer operation; Jesse launches these, per evals/CLAUDE.md. Requires ANTHROPIC_API_KEY.
- 步骤 1: Baseline (skills as released on dev) — from the main checkout (
/Users/jesse/git/superpowers/superpowers, on dev), or any checkout without this branch’s changes:
cd evalsexport SUPERPOWERS_ROOT=/Users/jesse/git/superpowers/superpowersuv run quorum run scenarios/sdd-rejects-extra-features --coding-agent claudeuv run quorum run scenarios/sdd-go-fractals --coding-agent claudeuv run quorum run scenarios/sdd-svelte-todo --coding-agent claudeuv run quorum run scenarios/spec-reviewer-catches-planted-flaws --coding-agent claude- 步骤 2: After (this branch’s skills) — point
SUPERPOWERS_ROOTat this worktree:
cd evalsexport SUPERPOWERS_ROOT=/Users/jesse/git/superpowers/superpowers/.claude/worktrees/sdd-review-dispatchuv run quorum run scenarios/sdd-rejects-extra-features --coding-agent claudeuv run quorum run scenarios/sdd-go-fractals --coding-agent claudeuv run quorum run scenarios/sdd-svelte-todo --coding-agent claudeuv run quorum run scenarios/spec-reviewer-catches-planted-flaws --coding-agent claudeuv run quorum run scenarios/sdd-quality-reviewer-catches-planted-defect --coding-agent claudeuv run quorum show- 步骤 3: Compare
Pass bar: all four pre-existing scenarios still pass after the change (no regression in catch rate); the 新 planted-defect 场景 passes. 对于 exploration 成本, compare reviewer-subagent tool-call counts between the before/after run transcripts (no automated check exists — the spec calls this out as a known gap).
Finishing
Section titled “Finishing”After all 任务 pass: the evals submodule commit needs to land in superpowers-evals (PR to its main), then this branch bumps the evals submodule pointer — per evals/CLAUDE.md, the parent bump is part of propagation, not optional. 然后 use superpowers:finishing-a-development-branch. PRs against superpowers target dev.