Skip to content

feat: add Z.AI (Zhipu AI) provider support#74

Open
vinit13792 wants to merge 3 commits intorepowise-dev:mainfrom
vinit13792:feat/litellm-local-proxy
Open

feat: add Z.AI (Zhipu AI) provider support#74
vinit13792 wants to merge 3 commits intorepowise-dev:mainfrom
vinit13792:feat/litellm-local-proxy

Conversation

@vinit13792
Copy link
Copy Markdown

Summary

  • Add ZAIProvider with OpenAI-compatible API for Z.AI (Zhipu AI)
  • Thinking disabled by default for GLM-5 family to avoid reasoning token overhead
  • Plan selection: coding (subscription) or general (pay-as-you-go)
  • Environment variables: ZAI_API_KEY, ZAI_PLAN, ZAI_BASE_URL, ZAI_THINKING
  • Rate limit defaults and auto-detection in CLI helpers

Usage

# Coding plan (subscription) - default
export ZAI_API_KEY=your-key
repowise init --provider zai --model glm-5.1

# General plan (pay-as-you-go)
export ZAI_API_KEY=your-key
export ZAI_PLAN=general
repowise init --provider zai

Test Plan

  • Unit tests pass (21 tests for ZAI provider)
  • Lint and type checks pass
  • Follows existing provider patterns (OllamaProvider, LiteLLMProvider)

Closes #68

vinit13792 and others added 3 commits April 12, 2026 11:31
- Add litellm to interactive provider selection menu
- Support LITELLM_BASE_URL for local proxy deployments (no API key required)
- Auto-add openai/ prefix when using api_base for proper LiteLLM routing
- Add dummy API key for local proxies (OpenAI SDK requirement)
- Add validation and tests for litellm provider configuration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… false positives

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add first-class support for Z.AI with OpenAI-compatible API.

- New ZAIProvider with thinking disabled by default for GLM-5 family
- Plan selection: 'coding' (subscription) or 'general' (pay-as-you-go)
- Environment variables: ZAI_API_KEY, ZAI_PLAN, ZAI_BASE_URL, ZAI_THINKING
- Rate limit defaults and auto-detection in CLI helpers

Closes repowise-dev#68
@Societus
Copy link
Copy Markdown

Thanks for picking this up. I filed #68 and have been testing against the Z.AI API directly -- a few observations.

Rate limits are unverified. The 60 RPM / 150K TPM defaults are copied from the litellm entry. I'm currently working with Z.AI to get actual per-plan limits -- their rate limiting behavior under concurrent load is one of the open questions blocking my own PR attempt. These defaults may be fine as a placeholder, but worth a # TODO or comment noting they're provisional. Main reason for mentioning is that default concurrency of 5 when running repowise jobs, so if running high end models like GLM-5.1 with a rate limit of 1 concurrent request, it creates a long line of failed generations because their API returns blank output encapsulated in a 429 error.

Thinking toggle is Z.AI-specific. @RaghavChamadiya mentioned wanting a generic mechanism. From my testing across providers, every one handles this differently (Z.AI uses extra_body, vLLM/Qwen3 uses chat_template_kwargs, LM Studio has no API control at all), so a provider-level hook may make more sense than a one-size abstraction. Just flagging since it was asked about in the issue.

I'm still dialing in some Z.AI-specific behavior (rate limits under concurrency, thinking toggle edge cases) and will share data as it comes in.

@Societus
Copy link
Copy Markdown

Quick update since my last comment -- I heard back from Z.AI support with specifics on concurrency limits per tier and have submitted a follow-up PR (#80) that implements tier-aware rate limiting.

Key findings from Z.AI support:

  • Limits are aggregate across all models (not per-model), dynamically adjusted based on system load
  • GLM-5 family models consume 2-3x quota per prompt (reasoning token overhead)
  • Recommended starting concurrency: Lite 2-3, Pro 5-8, Max 10-15

PR #80 includes ZAI_TIER=lite|pro|max env var support, conservative per-tier RPM/TPM defaults, and bumped retry budget (5 retries / 30s backoff) to better handle their load-shedding under concurrent use. It builds on top of this PR'''s provider work, rebased onto latest main.

Happy to split out just the tier changes if the maintainers prefer it as a stacked PR on top of this one instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Add Z.AI (Zhipu AI) provider support

2 participants