glm-zai-specialist
sonnet auto-delegatedCaution
The Z.ai API uses OpenAI-compatible schema but its own endpoint and key. Mixing up ZAI_API_KEY with OPENROUTER_API_KEY, or pointing at the wrong base URL, produces 401 errors with no helpful message. The agent always validates the environment before sending a request.
Tip
This agent is auto-delegated by /brewui:glm-design-to-code when Z.ai is the selected provider (the default). You can also invoke it directly for ad-hoc API calls, rate-limit debugging, or model selection questions.
Quick reference
| Field | Value |
|---|---|
| Agent name | glm-zai-specialist |
| Model | sonnet |
| Tools | Read, Write, Edit, Bash, Glob, Grep |
| Triggers | ”zai api”, “glm request”, “z.ai”, “send to glm”, “glm vision”, “glm model”, “design to code api”, “glm-5v”, “glm-4.6v” |
| Endpoint | https://api.z.ai/api/paas/v4/chat/completions |
| Fallback | https://openrouter.ai/api/v1/chat/completions |
When to use
- Vision request — send a screenshot or image to a GLM vision model and get structured output (HTML, code, JSON)
- Model selection — pick the right GLM tier (free flash vs paid turbo) for your budget and quality target
- Rate limit debugging — diagnose and recover from 429 errors with backoff or provider switch
- Response parsing — extract
===FILE: ...===blocks from multi-file GLM output - Pipeline troubleshooting — validate
ZAI_API_KEY,jq,base64, and pipeline scripts exist before any request
Examples
"Send this screenshot to GLM for design-to-code conversion"
"GLM is returning 429 errors, fix the request"
"Which GLM model should I use for a free iteration round?"
Flow
- Validate prerequisites
Checks
ZAI_API_KEY(orOPENROUTER_API_KEYfor fallback),jq,base64,curl, and pipeline scripts under$BU_PLUGIN_ROOT/skills/glm-design-to-code/scripts/. Stops with an explicit error if anything is missing — never silently sends a broken request. - Select model
Free dev/test:
glm-4.6v-flash. Budget production:glm-4.6v. Max quality:glm-5v-turbo. Vision models required for image input — text-only models (glm-4.5-flash,glm-5-turbo) are rejected for image payloads. - Build payload
Encodes the image as base64, wraps it in the OpenAI-compatible content array, merges the system prompt and context file, and sets
max_tokens. Usesglm-build-request.shor constructs manually withjqfor non-standard inputs. - Send request
Calls
glm-request.shwhich adds—retry 3 —retry-delay 5and streams the response to a file. Checks HTTP status, logs token usage (prompt / completion / reasoning), and reports estimated cost. - Parse response
Extracts
choices[0].message.content. Checksfinish_reason—stopis complete,lengthmeans truncated (increasesmax_tokensor splits the task). If the response uses===FILE: path===markers, runsglm-extract.shto write each file to the output directory. - Report
Summarises: model used, tokens consumed, estimated cost, list of extracted files, and next step (e.g., open Playwright verification or hand off to
tester).
Z.ai model matrix & rate limits
| Model | Vision | Input $/1M | Output $/1M | Context | Notes |
|---|---|---|---|---|---|
glm-5v-turbo | image + video | $1.20 | $4.00 | 202K | Best quality, CogViT |
glm-4.6v-flash | image | FREE | FREE | 131K | Free dev/test |
glm-4.6v | image + video | $0.30 | $0.90 | 131K | Budget production |
glm-5-turbo | text only | $1.20 | $4.00 | 202K | Text flagship |
glm-4.7-flash | text only | FREE | FREE | 202K | Free text |
glm-4.5-flash | text only | FREE | FREE | 131K | Free text |
Rate limit handling:
| Scenario | Solution |
|---|---|
| Free tier 429 | glm-request.sh retries: 5 s → 10 s → 20 s |
| Persistent 429 | Switch to paid model or wait 60 s |
| Image too large | Resize to 1024 px max side, reduce JPEG quality |
Common errors:
| Error | Cause | Fix |
|---|---|---|
| 401 Unauthorized | Invalid or missing API key | Check ZAI_API_KEY / OPENROUTER_API_KEY |
| 429 Too Many Requests | Free-tier rate limit | Retry with delay or use paid tier |
| 400 Bad Request | Malformed payload | jq empty payload.json |
| Empty content | Reasoning-only response | Read reasoning_content field instead |
finish_reason: length | Output truncated | Increase max_tokens (up to 131072) |
GLM OpenRouter Specialist
Same capabilities routed via OpenRouter — use as a fallback when Z.ai is rate-limited or unavailable.
glm-design-to-code
The skill that orchestrates this agent — screenshot or URL to multi-framework code in one command.
GitHub source
Agent definition, pipeline scripts, and prompt templates.
Brewui overview
All brewui skills and agents — image generation, design-to-code, GLM providers.
Updating plugins
/brewtools:plugin-update to check and update the brewcode plugin suite in one command.
See the FAQ for details.