Rate limits hit. Keep coding.

Caution

Anthropic Max costs $100-200/month. Hard rate limits still run out mid-workday. Thursday evening, Friday morning — locked out. Paid premium, zero output.

Tip

One command fixes it. Run claudeglm — Claude Code switches to GLM-5.1, the #1 model on SWE-bench Pro. Cost: ~$10/month backup budget. New terminal = back on Anthropic.

Your week on Anthropic Max

  1. Monday

    Fresh cycle. Claude Code at full speed. Everything works.

  2. Wednesday

    Heavy refactor. Deep in code. Limits at 60%.

  3. Thursday evening

    Rate limit hit. Claude Code stops responding. Stuck.

  4. Friday without backup

    Full workday lost. Waiting for next cycle. $200 paid, no output.

  5. Friday with provider-switch

    claudeglm — back to coding in 3 seconds. Same quality. $0.04 per session.

The math

⚠️

Without backup

$100-200/month subscription. Limits hit Thursday. Friday = zero productivity.

With provider-switch

Same subscription + ~$10/month on backup. Limits hit Thursday. Friday = full workday. Zero downtime.

One command. That’s it.

claudeglm       # GLM-5.1 — #1 SWE-bench Pro, 200K ctx
claudeqwen      # Qwen 3.6 Plus — 1M context window
claudeminimax    # MiniMax M2.7 — cheapest per token
claudeor         # OpenRouter — 200+ models, pick any

Tip

Each alias sets env vars and launches Claude Code automatically. To return to your Anthropic subscription — open a new terminal.

Setup in 2 minutes

  1. Get an API key

    Sign up at your chosen provider (links below). Top up $5-10. Takes 30 seconds.

  2. Run the skill

    /brewtools:provider-switch setup — paste your key when asked. Alias is written to ~/.zshrc automatically.

  3. Done

    Next time limits hit: type claudeglm. Back to coding.

Providers

Z.ai / GLM

#1 SWE-bench glm-5.1 — $1.40 / $4.40 per 1M tokens. 200K context. Best coding performance per dollar.

📄

Qwen

qwen3.6-plus — ~$0.50 / $2.00 per 1M tokens. 1M context window. Best for large codebases.

💰

MiniMax

Prompt Cache minimax-m2.7 — $0.30 / $1.20 per 1M tokens. 200K context. Cheapest option with verified prompt caching.

🔄

OpenRouter

200+ models. Route to GLM, Qwen, Gemini, Llama. One config, any provider. Custom model IDs validated via API.

Caution

Qwen: Singapore region only. The Anthropic-compatible endpoint works ONLY with API keys from the Singapore region. Open Model Studio → Singapore → API Key, create a key there. Valid format: sk-... (~36 chars). Keys from Frankfurt (sk-ws-...) return 403.

Technical details

How it works

Each alias sets 4-6 environment variables (ANTHROPIC_BASE_URL, auth key, model overrides) and runs claude. Claude Code reads these on startup and connects to the alternative provider instead of Anthropic.

Env vars only persist in the current shell session. Opening a new terminal resets everything — Claude Code uses your Anthropic subscription again.

Auth by provider

ProviderAuth patternNotes
All providersANTHROPIC_AUTH_TOKEN + ANTHROPIC_API_KEY=""Unified: Bearer token + empty API key blocks OAuth fallback

Context window

Claude Code defaults to 200K for non-Anthropic providers. The [1m] suffix on model names (e.g., qwen3.6-plus[1m]) forces 1M context. Applied automatically by the skill.

Skill arguments

ArgumentWhat it does
(none)Show status table
setupInteractive setup
glm / qwen / minimax / openrouterSingle-provider setup
verifyTest all configured tokens against endpoints
model-checkIdentify which model is actually responding (run inside provider session)
helpSwitching cheat sheet
Prompt caching

Claude Code automatically sends cache_control markers to reduce input token costs. Only input tokens can be cached — output tokens are always billed at full rate, regardless of provider.

Provider support

ProviderPrompt CacheStatusDetails
MiniMaxYesVerifiedFully documented — pricing, response fields, TTL. Recommended
QwenClaimedUnverifiedField accepted per compatibility matrix. No cache pricing or response fields documented
Z.ai / GLMAuto (backend)UnverifiedNative auto-cache may reduce costs on backend. Not visible to Claude Code
OpenRouterModel-dependentAnthropic models: yes. Non-Anthropic (Qwen, GLM): no

MiniMax cache pricing (minimax-m2.7)

Token typePrice / 1Mvs Standard input
Standard input$0.30
Cache write (first request)$0.375+25%
Cache read (subsequent)$0.06-80%

Cache TTL: 5 minutes, auto-refreshed on each hit. Minimum: 512 input tokens.

What this means in practice

A typical Claude Code request (~250K input tokens) with 90% cache hit rate:

ProviderCost per requestWithout cache
MiniMax (cached)~$0.02$0.08
Anthropic Opus 4.6 (cached)~$0.24$1.25
Qwen (no confirmed cache)$0.10-0.13$0.10-0.13
Z.ai (no visible cache)$0.35$0.35

Already set up? Verify it works.

  1. Test tokens

    /brewtools:provider-switch verify — sends a minimal request to each configured provider. Shows HTTP status and response for every token.

  2. Check the model

    Run your alias (claudeglm), then /brewtools:provider-switch model-check — asks 5 diagnostic questions to confirm which model is actually responding. No curl, no scripts — the model answers directly.

📄

Brewtools overview

All brewtools skills in one place.

🔗

GitHub source

Source code and provider references.

Updating plugins

Use /brewtools:plugin-update to check and update the brewcode plugin suite in one command. See the FAQ for details.