provider-switch — switch LLM provider

Tip

NEW — DeepSeek V4 (April 24, 2026) — added as priority default. deepseek-v4-pro (1.6T MoE, 1M context, thinking modes), top open-source on SWE-bench (self-reported 80.6% Verified / 55.4% Pro), verified prompt caching (90% discount, automatic). Alias claudeds. No compat flags required.

GPT-5.5 (OpenAI, native API) — has a first-party OpenAI API (gpt-5.5); also reachable through claudeor (OpenRouter) as one route among many. Anthropic’s current baseline is now Claude Opus 4.8.

SWE-bench snapshot (June 2026)

Model	Access	Verified	Pro	Notes
Gemini 3.1 Pro	Native (Google)	80.6% (self-reported)	~46.1% (self-reported)	Only publisher-stated Verified here; Pro from Scale, thinking mode
DeepSeek V4 Pro	Native (DeepSeek)	80.6% (self-reported)	55.4% (self-reported)	`claudeds` — top open-source; 90% auto cache discount
GLM-5.2	Native (Z.ai)	—	62.1% (self-reported)	`claudeglm` — strong per dollar
MiniMax-M3	Native (MiniMax)	—	~59% (self-reported)	`claudeminimax` — cheapest per token
Qwen 3.7 Plus	Native (Alibaba)	—	56.6% (self-reported)	`claudeqwen` — 1M context
Claude Opus 4.8	Native (Anthropic)	—	—	Anthropic baseline; no Verified % published
GPT-5.5	Native (OpenAI)	—	—	Native OpenAI API; also via OpenRouter
GPT-5.5 Codex	Native (OpenAI Codex)	—	—	No public benchmark figures

At publication the independent leaderboards (SWE-bench Verified and Scale SWE-bench Pro) had not yet listed the newest 2026 models; vendor figures are marked “(self-reported)” and the only publisher-stated Verified shown is Gemini 3.1 Pro (Google).

Sources: SWE-bench · SWE-bench Pro (Scale)

Caution

Anthropic Max costs $100-200/month. Hard rate limits still run out mid-workday. Thursday evening, Friday morning — locked out. Paid premium, zero output.

Tip

One command fixes it. Run claudeds or claudeglm — Claude Code switches to DeepSeek V4, GLM-5.2, or another backup. Cost: ~$10/month backup budget. New terminal = back on Anthropic.

Your week on Anthropic Max

Monday
Fresh cycle. Claude Code at full speed. Everything works.
Wednesday
Heavy refactor. Deep in code. Limits at 60%.
Thursday evening
Rate limit hit. Claude Code stops responding. Stuck.
Friday without backup
Full workday lost. Waiting for next cycle. $200 paid, no output.
Friday with provider-switch
claudeds — back to coding in 3 seconds. Same quality. $0.04 per session.

The math

⚠️

Without backup

$100-200/month subscription. Limits hit Thursday. Friday = zero productivity.

✅

With provider-switch

Same subscription + ~$10/month on backup. Limits hit Thursday. Friday = full workday. Zero downtime.

One command. That’s it.

claudeds  # DeepSeek V4 Pro (PRIORITY) — 1.6T MoE, 1M ctx
claudeglm       # GLM-5.2 — strong coding per dollar, 1M ctx
claudeqwen      # Qwen 3.7 Plus — 1M context window
claudeminimax    # MiniMax-M3 — cheapest per token, 1M ctx
claudeor         # OpenRouter — 200+ models, pick any

Tip

Each alias sets env vars and launches Claude Code automatically. To return to your Anthropic subscription — open a new terminal.

Setup in 2 minutes

Get an API key
Sign up at your chosen provider (links below). Top up $5-10. Takes 30 seconds.
Run the skill
/brewtools:provider-switch setup — paste your key when asked. Alias is written to ~/.zshrc automatically.
Done
Next time limits hit: type claudeds. Back to coding.

Providers

🚀

DeepSeek V4

Priority Top open-source Cache verified deepseek-v4-pro — 1.6T MoE, 1M context, thinking modes. SWE-bench Verified 80.6%, Pro 55.4% (both self-reported). 90% cache discount, automatic. Alias claudeds. No compat flags. Leaderboard.

⭐

Z.ai / GLM

Strong coding perf Cache verified glm-5.2 — $1.40 / $4.40 per 1M tokens, cached input $0.26. 1M context. SWE-bench Pro 62.1% (self-reported). Backend auto-cache (native).

📄

Qwen

Cache verified qwen3.7-plus — ~$0.50 / $2.00 per 1M tokens. 1M context. Implicit cache 20%, explicit 10% of standard (confirmed in billing).

💰

MiniMax

Cache verified MiniMax-M3 — $0.30 / $1.20 per 1M tokens. 1M context. Cheapest option with documented cache pricing.

🔄

OpenRouter

200+ models, one route among many. GPT-5.5 has a native OpenAI API (gpt-5.5) and is also reachable here; route to GPT-5.5 Codex, Gemini 3.1 Pro, GLM, Qwen, Llama, or Claude Opus 4.8. One config, any provider. Custom model IDs validated via API.

Caution

Qwen: Singapore region only. The Anthropic-compatible endpoint works ONLY with API keys from the Singapore region. Open Model Studio → Singapore → API Key, create a key there. Valid format: sk-... (~36 chars). Keys from Frankfurt (sk-ws-...) return 403.

Technical details

How it works

Each alias sets 4-6 environment variables (ANTHROPIC_BASE_URL, auth key, model overrides) and runs claude. Claude Code reads these on startup and connects to the alternative provider instead of Anthropic.

Env vars only persist in the current shell session. Opening a new terminal resets everything — Claude Code uses your Anthropic subscription again.

Auth by provider

Provider	Auth pattern	Notes
All providers	`ANTHROPIC_AUTH_TOKEN` + `ANTHROPIC_API_KEY=""`	Unified: Bearer token + empty API key blocks OAuth fallback

Context window

Claude Code defaults to 200K for non-Anthropic providers. The [1m] suffix on model names (e.g., qwen3.7-plus[1m]) forces 1M context. Applied automatically by the skill.

Skill arguments

Argument	What it does
(none)	Show status table
`setup`	Interactive setup
`deepseek` / `ds` / `glm` / `qwen` / `minimax` / `openrouter`	Single-provider setup
`verify`	Test all configured tokens against endpoints
`model-check`	Identify which model is actually responding (run inside provider session)
`help`	Switching cheat sheet

Prompt caching

Claude Code automatically sends cache_control markers to reduce input token costs. Only input tokens can be cached — output tokens are always billed at full rate, regardless of provider.

Provider support

Provider	Prompt Cache	Status	Details
DeepSeek V4	Yes	Verified	Automatic on 1024+ token repeated prefixes. 90% discount ($0.03/M on Pro). No opt-in required
MiniMax	Yes	Verified	Fully documented — pricing, response fields, TTL
Qwen	Yes	Verified	Implicit cache 20%, explicit 10% of standard rate. Min 1024 tokens. Confirmed in Alibaba billing
Z.ai / GLM	Yes	Verified	Backend auto-cache (native). Applied transparently at the provider side
OpenRouter	Model-dependent	—	Anthropic models: yes. Non-Anthropic (Qwen, GLM): routed to provider cache

MiniMax cache pricing (MiniMax-M3)

Token type	Price / 1M	vs Standard input
Standard input	$0.30	—
Cache write (first request)	$0.375	+25%
Cache read (subsequent)	$0.06	-80%

Cache TTL: 5 minutes, auto-refreshed on each hit. Minimum: 512 input tokens.

What this means in practice

A typical Claude Code request (~250K input tokens) with 90% cache hit rate:

Provider	Cost per request	Without cache
DeepSeek V4 Pro (cached)	~$0.01	$0.44
MiniMax (cached)	~$0.02	$0.08
Qwen (cached, explicit 10%)	~$0.02	$0.13
Anthropic Opus 4.8 (cached)	~$0.24	$1.25
Z.ai / GLM (backend cache)	~$0.35	$0.35

Already set up? Verify it works.

Test tokens
/brewtools:provider-switch verify — sends a minimal request to each configured provider. Shows HTTP status and response for every token.
Check the model
Run your alias (claudeglm), then /brewtools:provider-switch model-check — asks 5 diagnostic questions to confirm which model is actually responding. No curl, no scripts — the model answers directly.

📄

Brewtools overview

All brewtools skills in one place.

🔗

GitHub source

Source code and provider references.

Updating plugins

Use /brewtools:plugin-update to check and update the brewcode plugin suite in one command. See the FAQ for details.