provider-switch — switch LLM provider
Tip
NEW — DeepSeek V4 (April 24, 2026) — added as priority default. deepseek-v4-pro (1.6T MoE, 1M context, thinking modes), top open-source on SWE-bench, verified prompt caching (90% discount, automatic). Alias claudedeepseek. No compat flags required.
GPT-5.5 released April 23, 2026 — new #1 on SWE-bench Verified (88.7%), but Opus 4.7 still wins SWE-bench Pro (64.3% vs 58.6%). Accessible via claudeor (OpenRouter).
SWE-bench snapshot (April 2026)
| # | Model | Verified | Pro | Notes |
|---|---|---|---|---|
| 1 | GPT-5.5 | 88.7% | 58.6% | OpenRouter only |
| 2 | Claude Opus 4.7 | 87.6% | 64.3% | Native Anthropic |
| 3 | GPT-5.3 Codex | 85.0% | — | OpenRouter only |
| 4 | Gemini 3.1 Pro | 80.6% | — | OpenRouter only |
| 4 | DeepSeek V4 Pro | 80.6% | 55.4% | claudedeepseek — top open-source |
| — | GLM-5.1 | ~47% | — | claudeglm — strong per dollar |
Sources: SWE-bench · SWE-bench Pro (Scale)
Caution
Anthropic Max costs $100-200/month. Hard rate limits still run out mid-workday. Thursday evening, Friday morning — locked out. Paid premium, zero output.
Tip
One command fixes it. Run claudedeepseek or claudeglm — Claude Code switches to DeepSeek V4, GLM-5.1, or another backup. Cost: ~$10/month backup budget. New terminal = back on Anthropic.
Your week on Anthropic Max
- Monday
Fresh cycle. Claude Code at full speed. Everything works.
- Wednesday
Heavy refactor. Deep in code. Limits at 60%.
- Thursday evening
Rate limit hit. Claude Code stops responding. Stuck.
- Friday without backup
Full workday lost. Waiting for next cycle. $200 paid, no output.
- Friday with provider-switch
claudedeepseek— back to coding in 3 seconds. Same quality. $0.04 per session.
The math
Without backup
$100-200/month subscription. Limits hit Thursday. Friday = zero productivity.
With provider-switch
Same subscription + ~$10/month on backup. Limits hit Thursday. Friday = full workday. Zero downtime.
One command. That’s it.
claudedeepseek # DeepSeek V4 Pro (PRIORITY) — 1.6T MoE, 1M ctx
claudeglm # GLM-5.1 — #1 SWE-bench Pro, 200K ctx
claudeqwen # Qwen 3.6 Plus — 1M context window
claudeminimax # MiniMax M2.7 — cheapest per token
claudeor # OpenRouter — 200+ models, pick any
Tip
Each alias sets env vars and launches Claude Code automatically. To return to your Anthropic subscription — open a new terminal.
Setup in 2 minutes
- Get an API key
Sign up at your chosen provider (links below). Top up $5-10. Takes 30 seconds.
- Run the skill
/brewtools:provider-switch setup— paste your key when asked. Alias is written to~/.zshrcautomatically. - Done
Next time limits hit: type
claudedeepseek. Back to coding.
Providers
DeepSeek V4
Priority Top open-source Cache verified deepseek-v4-pro — 1.6T MoE, 1M context, thinking modes. SWE-bench Verified 80.6%, Pro 55.4%. 90% cache discount, automatic. Alias claudedeepseek. No compat flags. Leaderboard.
Z.ai / GLM
Strong coding perf Cache verified glm-5.1 — $1.40 / $4.40 per 1M tokens. 200K context. Backend auto-cache (native).
Qwen
Cache verified qwen3.6-plus — ~$0.50 / $2.00 per 1M tokens. 1M context. Implicit cache 20%, explicit 10% of standard (confirmed in billing).
MiniMax
Cache verified minimax-m2.7 — $0.30 / $1.20 per 1M tokens. 200K context. Cheapest option with documented cache pricing.
OpenRouter
200+ models. Route to GPT-5.5 (new #1 Verified, 88.7%), GPT-5.3 Codex, Gemini 3.1 Pro, GLM, Qwen, Llama. One config, any provider. Custom model IDs validated via API.
Caution
Qwen: Singapore region only. The Anthropic-compatible endpoint works ONLY with API keys from the Singapore region. Open Model Studio → Singapore → API Key, create a key there. Valid format: sk-... (~36 chars). Keys from Frankfurt (sk-ws-...) return 403.
Technical details
How it works
Each alias sets 4-6 environment variables (ANTHROPIC_BASE_URL, auth key, model overrides) and runs claude. Claude Code reads these on startup and connects to the alternative provider instead of Anthropic.
Env vars only persist in the current shell session. Opening a new terminal resets everything — Claude Code uses your Anthropic subscription again.
Auth by provider
| Provider | Auth pattern | Notes |
|---|---|---|
| All providers | ANTHROPIC_AUTH_TOKEN + ANTHROPIC_API_KEY="" | Unified: Bearer token + empty API key blocks OAuth fallback |
Context window
Claude Code defaults to 200K for non-Anthropic providers. The [1m] suffix on model names (e.g., qwen3.6-plus[1m]) forces 1M context. Applied automatically by the skill.
Skill arguments
| Argument | What it does |
|---|---|
| (none) | Show status table |
setup | Interactive setup |
deepseek / ds / glm / qwen / minimax / openrouter | Single-provider setup |
verify | Test all configured tokens against endpoints |
model-check | Identify which model is actually responding (run inside provider session) |
help | Switching cheat sheet |
Prompt caching
Claude Code automatically sends cache_control markers to reduce input token costs. Only input tokens can be cached — output tokens are always billed at full rate, regardless of provider.
Provider support
| Provider | Prompt Cache | Status | Details |
|---|---|---|---|
| DeepSeek V4 | Yes | Verified | Automatic on 1024+ token repeated prefixes. 90% discount ($0.03/M on Pro). No opt-in required |
| MiniMax | Yes | Verified | Fully documented — pricing, response fields, TTL |
| Qwen | Yes | Verified | Implicit cache 20%, explicit 10% of standard rate. Min 1024 tokens. Confirmed in Alibaba billing |
| Z.ai / GLM | Yes | Verified | Backend auto-cache (native). Applied transparently at the provider side |
| OpenRouter | Model-dependent | — | Anthropic models: yes. Non-Anthropic (Qwen, GLM): routed to provider cache |
MiniMax cache pricing (minimax-m2.7)
| Token type | Price / 1M | vs Standard input |
|---|---|---|
| Standard input | $0.30 | — |
| Cache write (first request) | $0.375 | +25% |
| Cache read (subsequent) | $0.06 | -80% |
Cache TTL: 5 minutes, auto-refreshed on each hit. Minimum: 512 input tokens.
What this means in practice
A typical Claude Code request (~250K input tokens) with 90% cache hit rate:
| Provider | Cost per request | Without cache |
|---|---|---|
| DeepSeek V4 Pro (cached) | ~$0.01 | $0.44 |
| MiniMax (cached) | ~$0.02 | $0.08 |
| Qwen (cached, explicit 10%) | ~$0.02 | $0.13 |
| Anthropic Opus 4.7 (cached) | ~$0.24 | $1.25 |
| Z.ai / GLM (backend cache) | ~$0.35 | $0.35 |
Already set up? Verify it works.
- Test tokens
/brewtools:provider-switch verify— sends a minimal request to each configured provider. Shows HTTP status and response for every token. - Check the model
Run your alias (
claudeglm), then/brewtools:provider-switch model-check— asks 5 diagnostic questions to confirm which model is actually responding. No curl, no scripts — the model answers directly.
Brewtools overview
All brewtools skills in one place.
GitHub source
Source code and provider references.
Updating plugins
/brewtools:plugin-update to check and update the brewcode plugin suite in one command.
See the FAQ for details.