feat: add Z.AI (Zhipu AI) provider support by vinit13792 · Pull Request #74 · repowise-dev/repowise

vinit13792 · 2026-04-12T08:36:09Z

Summary

Add ZAIProvider with OpenAI-compatible API for Z.AI (Zhipu AI)
Thinking disabled by default for GLM-5 family to avoid reasoning token overhead
Plan selection: coding (subscription) or general (pay-as-you-go)
Environment variables: ZAI_API_KEY, ZAI_PLAN, ZAI_BASE_URL, ZAI_THINKING
Rate limit defaults and auto-detection in CLI helpers

Usage

# Coding plan (subscription) - default
export ZAI_API_KEY=your-key
repowise init --provider zai --model glm-5.1

# General plan (pay-as-you-go)
export ZAI_API_KEY=your-key
export ZAI_PLAN=general
repowise init --provider zai

Test Plan

Unit tests pass (21 tests for ZAI provider)
Lint and type checks pass
Follows existing provider patterns (OllamaProvider, LiteLLMProvider)

Closes #68

- Add litellm to interactive provider selection menu - Support LITELLM_BASE_URL for local proxy deployments (no API key required) - Auto-add openai/ prefix when using api_base for proper LiteLLM routing - Add dummy API key for local proxies (OpenAI SDK requirement) - Add validation and tests for litellm provider configuration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… false positives Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add first-class support for Z.AI with OpenAI-compatible API. - New ZAIProvider with thinking disabled by default for GLM-5 family - Plan selection: 'coding' (subscription) or 'general' (pay-as-you-go) - Environment variables: ZAI_API_KEY, ZAI_PLAN, ZAI_BASE_URL, ZAI_THINKING - Rate limit defaults and auto-detection in CLI helpers Closes repowise-dev#68

Societus · 2026-04-12T18:38:47Z

Thanks for picking this up. I filed #68 and have been testing against the Z.AI API directly -- a few observations.

Rate limits are unverified. The 60 RPM / 150K TPM defaults are copied from the litellm entry. I'm currently working with Z.AI to get actual per-plan limits -- their rate limiting behavior under concurrent load is one of the open questions blocking my own PR attempt. These defaults may be fine as a placeholder, but worth a # TODO or comment noting they're provisional. Main reason for mentioning is that default concurrency of 5 when running repowise jobs, so if running high end models like GLM-5.1 with a rate limit of 1 concurrent request, it creates a long line of failed generations because their API returns blank output encapsulated in a 429 error.

Thinking toggle is Z.AI-specific. @RaghavChamadiya mentioned wanting a generic mechanism. From my testing across providers, every one handles this differently (Z.AI uses extra_body, vLLM/Qwen3 uses chat_template_kwargs, LM Studio has no API control at all), so a provider-level hook may make more sense than a one-size abstraction. Just flagging since it was asked about in the issue.

I'm still dialing in some Z.AI-specific behavior (rate limits under concurrency, thinking toggle edge cases) and will share data as it comes in.

Societus · 2026-04-13T19:43:46Z

Quick update since my last comment -- I heard back from Z.AI support with specifics on concurrency limits per tier and have submitted a follow-up PR (#80) that implements tier-aware rate limiting.

Key findings from Z.AI support:

Limits are aggregate across all models (not per-model), dynamically adjusted based on system load
GLM-5 family models consume 2-3x quota per prompt (reasoning token overhead)
Recommended starting concurrency: Lite 2-3, Pro 5-8, Max 10-15

PR #80 includes ZAI_TIER=lite|pro|max env var support, conservative per-tier RPM/TPM defaults, and bumped retry budget (5 retries / 30s backoff) to better handle their load-shedding under concurrent use. It builds on top of this PR'''s provider work, rebased onto latest main.

Happy to split out just the tier changes if the maintainers prefer it as a stacked PR on top of this one instead.

vinit13792 and others added 3 commits April 12, 2026 11:31

fix(litellm): add inline comment for sk-dummy to avoid secret scanner…

2fabd0a

… false positives Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

vinit13792 requested review from RaghavChamadiya and swati510 as code owners April 12, 2026 08:36

Societus mentioned this pull request Apr 13, 2026

feat(zai): add Z.AI provider with tier-aware rate limiting #80

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Z.AI (Zhipu AI) provider support#74

feat: add Z.AI (Zhipu AI) provider support#74
vinit13792 wants to merge 3 commits intorepowise-dev:mainfrom
vinit13792:feat/litellm-local-proxy

vinit13792 commented Apr 12, 2026

Uh oh!

Societus commented Apr 12, 2026

Uh oh!

Societus commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vinit13792 commented Apr 12, 2026

Summary

Usage

Test Plan

Uh oh!

Societus commented Apr 12, 2026

Uh oh!

Societus commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants