provider-switch — switch LLM provider

Tip

NEW — DeepSeek V4 (April 24, 2026) — added as priority default. deepseek-v4-pro (1.6T MoE, 1M context, thinking modes), top open-source on SWE-bench, verified prompt caching (90% discount, automatic). Alias claudedeepseek. No compat flags required.

GPT-5.5 released April 23, 2026 — new #1 on SWE-bench Verified (88.7%), but Opus 4.7 still wins SWE-bench Pro (64.3% vs 58.6%). Accessible via claudeor (OpenRouter).

SWE-bench snapshot (April 2026)

#ModelVerifiedProNotes
1GPT-5.588.7%58.6%OpenRouter only
2Claude Opus 4.787.6%64.3%Native Anthropic
3GPT-5.3 Codex85.0%OpenRouter only
4Gemini 3.1 Pro80.6%OpenRouter only
4DeepSeek V4 Pro80.6%55.4%claudedeepseek — top open-source
GLM-5.1~47%claudeglm — strong per dollar

Sources: SWE-bench · SWE-bench Pro (Scale)

Caution

Anthropic Max costs $100-200/month. Hard rate limits still run out mid-workday. Thursday evening, Friday morning — locked out. Paid premium, zero output.

Tip

One command fixes it. Run claudedeepseek or claudeglm — Claude Code switches to DeepSeek V4, GLM-5.1, or another backup. Cost: ~$10/month backup budget. New terminal = back on Anthropic.

Your week on Anthropic Max

  1. Monday

    Fresh cycle. Claude Code at full speed. Everything works.

  2. Wednesday

    Heavy refactor. Deep in code. Limits at 60%.

  3. Thursday evening

    Rate limit hit. Claude Code stops responding. Stuck.

  4. Friday without backup

    Full workday lost. Waiting for next cycle. $200 paid, no output.

  5. Friday with provider-switch

    claudedeepseek — back to coding in 3 seconds. Same quality. $0.04 per session.

The math

⚠️

Without backup

$100-200/month subscription. Limits hit Thursday. Friday = zero productivity.

With provider-switch

Same subscription + ~$10/month on backup. Limits hit Thursday. Friday = full workday. Zero downtime.

One command. That’s it.

claudedeepseek  # DeepSeek V4 Pro (PRIORITY) — 1.6T MoE, 1M ctx
claudeglm       # GLM-5.1 — #1 SWE-bench Pro, 200K ctx
claudeqwen      # Qwen 3.6 Plus — 1M context window
claudeminimax    # MiniMax M2.7 — cheapest per token
claudeor         # OpenRouter — 200+ models, pick any

Tip

Each alias sets env vars and launches Claude Code automatically. To return to your Anthropic subscription — open a new terminal.

Setup in 2 minutes

  1. Get an API key

    Sign up at your chosen provider (links below). Top up $5-10. Takes 30 seconds.

  2. Run the skill

    /brewtools:provider-switch setup — paste your key when asked. Alias is written to ~/.zshrc automatically.

  3. Done

    Next time limits hit: type claudedeepseek. Back to coding.

Providers

🚀

DeepSeek V4

Priority Top open-source Cache verified deepseek-v4-pro — 1.6T MoE, 1M context, thinking modes. SWE-bench Verified 80.6%, Pro 55.4%. 90% cache discount, automatic. Alias claudedeepseek. No compat flags. Leaderboard.

Z.ai / GLM

Strong coding perf Cache verified glm-5.1 — $1.40 / $4.40 per 1M tokens. 200K context. Backend auto-cache (native).

📄

Qwen

Cache verified qwen3.6-plus — ~$0.50 / $2.00 per 1M tokens. 1M context. Implicit cache 20%, explicit 10% of standard (confirmed in billing).

💰

MiniMax

Cache verified minimax-m2.7 — $0.30 / $1.20 per 1M tokens. 200K context. Cheapest option with documented cache pricing.

🔄

OpenRouter

200+ models. Route to GPT-5.5 (new #1 Verified, 88.7%), GPT-5.3 Codex, Gemini 3.1 Pro, GLM, Qwen, Llama. One config, any provider. Custom model IDs validated via API.

Caution

Qwen: Singapore region only. The Anthropic-compatible endpoint works ONLY with API keys from the Singapore region. Open Model Studio → Singapore → API Key, create a key there. Valid format: sk-... (~36 chars). Keys from Frankfurt (sk-ws-...) return 403.

Technical details

How it works

Each alias sets 4-6 environment variables (ANTHROPIC_BASE_URL, auth key, model overrides) and runs claude. Claude Code reads these on startup and connects to the alternative provider instead of Anthropic.

Env vars only persist in the current shell session. Opening a new terminal resets everything — Claude Code uses your Anthropic subscription again.

Auth by provider

ProviderAuth patternNotes
All providersANTHROPIC_AUTH_TOKEN + ANTHROPIC_API_KEY=""Unified: Bearer token + empty API key blocks OAuth fallback

Context window

Claude Code defaults to 200K for non-Anthropic providers. The [1m] suffix on model names (e.g., qwen3.6-plus[1m]) forces 1M context. Applied automatically by the skill.

Skill arguments

ArgumentWhat it does
(none)Show status table
setupInteractive setup
deepseek / ds / glm / qwen / minimax / openrouterSingle-provider setup
verifyTest all configured tokens against endpoints
model-checkIdentify which model is actually responding (run inside provider session)
helpSwitching cheat sheet
Prompt caching

Claude Code automatically sends cache_control markers to reduce input token costs. Only input tokens can be cached — output tokens are always billed at full rate, regardless of provider.

Provider support

ProviderPrompt CacheStatusDetails
DeepSeek V4YesVerifiedAutomatic on 1024+ token repeated prefixes. 90% discount ($0.03/M on Pro). No opt-in required
MiniMaxYesVerifiedFully documented — pricing, response fields, TTL
QwenYesVerifiedImplicit cache 20%, explicit 10% of standard rate. Min 1024 tokens. Confirmed in Alibaba billing
Z.ai / GLMYesVerifiedBackend auto-cache (native). Applied transparently at the provider side
OpenRouterModel-dependentAnthropic models: yes. Non-Anthropic (Qwen, GLM): routed to provider cache

MiniMax cache pricing (minimax-m2.7)

Token typePrice / 1Mvs Standard input
Standard input$0.30
Cache write (first request)$0.375+25%
Cache read (subsequent)$0.06-80%

Cache TTL: 5 minutes, auto-refreshed on each hit. Minimum: 512 input tokens.

What this means in practice

A typical Claude Code request (~250K input tokens) with 90% cache hit rate:

ProviderCost per requestWithout cache
DeepSeek V4 Pro (cached)~$0.01$0.44
MiniMax (cached)~$0.02$0.08
Qwen (cached, explicit 10%)~$0.02$0.13
Anthropic Opus 4.7 (cached)~$0.24$1.25
Z.ai / GLM (backend cache)~$0.35$0.35

Already set up? Verify it works.

  1. Test tokens

    /brewtools:provider-switch verify — sends a minimal request to each configured provider. Shows HTTP status and response for every token.

  2. Check the model

    Run your alias (claudeglm), then /brewtools:provider-switch model-check — asks 5 diagnostic questions to confirm which model is actually responding. No curl, no scripts — the model answers directly.

📄

Brewtools overview

All brewtools skills in one place.

🔗

GitHub source

Source code and provider references.

Updating plugins

Use /brewtools:plugin-update to check and update the brewcode plugin suite in one command. See the FAQ for details.