Rate limits hit. Keep coding.
Caution
Anthropic Max costs $100-200/month. Hard rate limits still run out mid-workday. Thursday evening, Friday morning — locked out. Paid premium, zero output.
Tip
One command fixes it. Run claudeglm — Claude Code switches to GLM-5.1, the #1 model on SWE-bench Pro. Cost: ~$10/month backup budget. New terminal = back on Anthropic.
Your week on Anthropic Max
- Monday
Fresh cycle. Claude Code at full speed. Everything works.
- Wednesday
Heavy refactor. Deep in code. Limits at 60%.
- Thursday evening
Rate limit hit. Claude Code stops responding. Stuck.
- Friday without backup
Full workday lost. Waiting for next cycle. $200 paid, no output.
- Friday with provider-switch
claudeglm— back to coding in 3 seconds. Same quality. $0.04 per session.
The math
Without backup
$100-200/month subscription. Limits hit Thursday. Friday = zero productivity.
With provider-switch
Same subscription + ~$10/month on backup. Limits hit Thursday. Friday = full workday. Zero downtime.
One command. That’s it.
claudeglm # GLM-5.1 — #1 SWE-bench Pro, 200K ctx
claudeqwen # Qwen 3.6 Plus — 1M context window
claudeminimax # MiniMax M2.7 — cheapest per token
claudeor # OpenRouter — 200+ models, pick any
Tip
Each alias sets env vars and launches Claude Code automatically. To return to your Anthropic subscription — open a new terminal.
Setup in 2 minutes
- Get an API key
Sign up at your chosen provider (links below). Top up $5-10. Takes 30 seconds.
- Run the skill
/brewtools:provider-switch setup— paste your key when asked. Alias is written to~/.zshrcautomatically. - Done
Next time limits hit: type
claudeglm. Back to coding.
Providers
Z.ai / GLM
#1 SWE-bench glm-5.1 — $1.40 / $4.40 per 1M tokens. 200K context. Best coding performance per dollar.
Qwen
qwen3.6-plus — ~$0.50 / $2.00 per 1M tokens. 1M context window. Best for large codebases.
MiniMax
Prompt Cache minimax-m2.7 — $0.30 / $1.20 per 1M tokens. 200K context. Cheapest option with verified prompt caching.
OpenRouter
200+ models. Route to GLM, Qwen, Gemini, Llama. One config, any provider. Custom model IDs validated via API.
Caution
Qwen: Singapore region only. The Anthropic-compatible endpoint works ONLY with API keys from the Singapore region. Open Model Studio → Singapore → API Key, create a key there. Valid format: sk-... (~36 chars). Keys from Frankfurt (sk-ws-...) return 403.
Technical details
How it works
Each alias sets 4-6 environment variables (ANTHROPIC_BASE_URL, auth key, model overrides) and runs claude. Claude Code reads these on startup and connects to the alternative provider instead of Anthropic.
Env vars only persist in the current shell session. Opening a new terminal resets everything — Claude Code uses your Anthropic subscription again.
Auth by provider
| Provider | Auth pattern | Notes |
|---|---|---|
| All providers | ANTHROPIC_AUTH_TOKEN + ANTHROPIC_API_KEY="" | Unified: Bearer token + empty API key blocks OAuth fallback |
Context window
Claude Code defaults to 200K for non-Anthropic providers. The [1m] suffix on model names (e.g., qwen3.6-plus[1m]) forces 1M context. Applied automatically by the skill.
Skill arguments
| Argument | What it does |
|---|---|
| (none) | Show status table |
setup | Interactive setup |
glm / qwen / minimax / openrouter | Single-provider setup |
verify | Test all configured tokens against endpoints |
model-check | Identify which model is actually responding (run inside provider session) |
help | Switching cheat sheet |
Prompt caching
Claude Code automatically sends cache_control markers to reduce input token costs. Only input tokens can be cached — output tokens are always billed at full rate, regardless of provider.
Provider support
| Provider | Prompt Cache | Status | Details |
|---|---|---|---|
| MiniMax | Yes | Verified | Fully documented — pricing, response fields, TTL. Recommended |
| Qwen | Claimed | Unverified | Field accepted per compatibility matrix. No cache pricing or response fields documented |
| Z.ai / GLM | Auto (backend) | Unverified | Native auto-cache may reduce costs on backend. Not visible to Claude Code |
| OpenRouter | Model-dependent | — | Anthropic models: yes. Non-Anthropic (Qwen, GLM): no |
MiniMax cache pricing (minimax-m2.7)
| Token type | Price / 1M | vs Standard input |
|---|---|---|
| Standard input | $0.30 | — |
| Cache write (first request) | $0.375 | +25% |
| Cache read (subsequent) | $0.06 | -80% |
Cache TTL: 5 minutes, auto-refreshed on each hit. Minimum: 512 input tokens.
What this means in practice
A typical Claude Code request (~250K input tokens) with 90% cache hit rate:
| Provider | Cost per request | Without cache |
|---|---|---|
| MiniMax (cached) | ~$0.02 | $0.08 |
| Anthropic Opus 4.6 (cached) | ~$0.24 | $1.25 |
| Qwen (no confirmed cache) | $0.10-0.13 | $0.10-0.13 |
| Z.ai (no visible cache) | $0.35 | $0.35 |
Already set up? Verify it works.
- Test tokens
/brewtools:provider-switch verify— sends a minimal request to each configured provider. Shows HTTP status and response for every token. - Check the model
Run your alias (
claudeglm), then/brewtools:provider-switch model-check— asks 5 diagnostic questions to confirm which model is actually responding. No curl, no scripts — the model answers directly.
Brewtools overview
All brewtools skills in one place.
GitHub source
Source code and provider references.
Updating plugins
/brewtools:plugin-update to check and update the brewcode plugin suite in one command.
See the FAQ for details.