Tester
Caution
Flaky tests, regressions, and mysterious CI failures cost hours of developer time. When tests start failing after a refactor, the root cause is rarely obvious — stack traces mislead, shared state hides bugs, and sleep-based waits mask timing issues. The tester agent exists to handle exactly this class of problem without pulling the developer away from production code.
Tip
Tester runs tests, diagnoses failures, and fixes test-level issues autonomously. Production bugs are routed to developer with a precise report — test code, assertion logic, and flakiness are fixed in place. Trigger with: “run tests”, “tests failing”, “flaky test”, “debug test”, “test coverage”.
Quick reference
| Field | Value |
|---|---|
| Model | sonnet |
| Tools | Read, Write, Edit, Glob, Grep, Bash, Task |
| Triggers | ”run tests”, “tests failing”, “flaky test”, “debug test”, “test coverage” |
| Scope YES | Run tests, analyze failures, debug flaky, fix test code, configure test runs, report issues |
| Scope NO | Fix production code (→ developer), substantial test rewrites (→ developer) |
When to use
- Post-refactor validation — tests broke after a code change, need failure triage before developer continues
- Flaky test elimination — tests that pass sometimes and fail other times, timing or state issues
- CI failure investigation — build is red, need a structured failure report with root cause
- Coverage gaps — need to understand what’s covered and what’s missing before a release
- Framework migration — existing tests need to run correctly under a new test runner or config
Examples
"Run the tests, several are failing after my refactor"
"This test passes sometimes and fails other times — debug it"
"Check test coverage before we release"
Flow
- Pre-analysis
Reads all rules (
.claude/rules/*-best-practice.md,.claude/rules/*-avoid.md) andCLAUDE.mdfor test commands, frameworks, and coverage requirements. Analyzes existing test patterns before touching anything. - Stack detection
Identifies test framework from project indicators:
jest.config.*→ Jest,pytest.ini/conftest.py→ pytest,*Test.java/pom.xml→ JUnit,*_test.go→ go test,*.spec.ts→ Jasmine/Mocha,Cargo.toml+#[test]→ Rust. Also determines test level: unit, integration, E2E, or component. - Execute and capture
Runs the test command from project config. Captures full output — pass/fail counts, stack traces, timing. No guessing at commands; uses the project-defined entrypoint.
- Analyze failures
Reads stack traces bottom-up, compares expected vs actual values. Categorizes each failure: TEST BUG (fix here), PRODUCTION BUG (route to developer), ENVIRONMENT issue, or FLAKY (fix here). Applies GIVEN/WHEN/THEN structure to new or modified tests.
- Fix in scope
Fixes test-level issues directly: flaky timing, shared state, over-mocking, conditional assertions, sleep-based waits. Enforces quality rules — unit tests under 100ms, integration under 5s, E2E under 30s.
- Report
Produces a structured report with scope, command, duration, summary counts, failure details with root cause and fix suggestions, flaky list, coverage numbers, and next steps. Production bugs go to developer with a precise reproduction path.
Internals
Output format
=== TEST EXECUTION REPORT ===
Scope: [level] | Command: [cmd] | Duration: [time]
SUMMARY: ✅ Passed: X | ❌ Failed: Y | Skipped: Z
FAILURES (→ DEVELOPER):
1. [Test#method] File: [path:line]
Error: [msg] | Expected: [x] | Actual: [y]
Root cause: [analysis] | Fix: [suggestion]
FLAKY (I will fix): [list]
COVERAGE: Line [%] | Branch [%]
NEXT: Developer fixes [list] → Re-runAnti-patterns the agent fixes
| Pattern | Fix |
|---|---|
| Flaky tests | Add proper waits, remove timing dependencies |
| Shared state | Reset before each test |
| Over-mocking | Use real objects where practical |
| Conditional assertions | Assert preconditions first |
| Sleep-based waits | Use polling/async utilities |
Test quality rules enforced
| Aspect | Rule |
|---|---|
| Names | Describe behavior clearly |
| Structure | Arrange/Act/Assert or GIVEN/WHEN/THEN |
| Assertions | Single focus, concrete values, with description |
| Speed | Unit <100ms, IT <5s, E2E <30s |
Scope boundary
Tester fixes test code and configuration. Any failure that requires changing production code is handed off to developer with a report including test name, file path, expected vs actual values, and root cause analysis.
Developer
Receives production bug reports from tester. Implements fixes, features, and refactors.
Reviewer
Architecture, security, and performance review after tests pass.
GitHub source
Agent definition, role boundaries, and output format specification.
Brewcode overview
All brewcode agents and skills — infinite task execution, quorum reviews, session handoff.
Updating plugins
/brewtools:plugin-update to check and update the brewcode plugin suite in one command.
See the FAQ for details.