feat(providers): add "local/" routing prefix for OpenAI-compatible local servers#3044
Open
np6126 wants to merge 1 commit into
Open
feat(providers): add "local/" routing prefix for OpenAI-compatible local servers#3044np6126 wants to merge 1 commit into
np6126 wants to merge 1 commit into
Conversation
…cal servers
A common use case is running an OpenAI-compatible inference server locally —
Ollama, LM Studio, vLLM, llama.cpp — and routing to it through the OpenAI
provider client.
The existing "openai/" prefix would work, but mentally conflates "local
Ollama / vLLM" with "the OpenAI API itself", and the more obvious model id
formats actively misroute:
- "qwen/qwen3:14b" or "qwen3-coder" → metadata_for_model maps the "qwen"
family to DashScope (DASHSCOPE_API_KEY, dashscope.aliyuncs.com). A user
serving Qwen3 locally on Ollama gets routed to Alibaba's cloud instead
of their local server.
- "kimi/kimi-k2.5" → same DashScope routing.
- "grok-..." → routes to xAI.
None of these can express "this is a local copy of that family, route as
OpenAI-compatible to my local server" without forcibly setting
OPENAI_BASE_URL plus the "openai/" prefix — which is unintuitive.
The "local/" prefix solves it: operators express intent ("this is a local
inference server, route as OpenAI client") without naming collision.
Routing semantics are identical to "openai/": OpenAi provider,
OPENAI_API_KEY auth env (typically an unused placeholder for local
servers), OPENAI_BASE_URL.
Two places updated:
- metadata_for_model: recognise "local/" alongside "openai/"
- wire_model_for_base_url: strip "local/" prefix on the wire
No change to strip_routing_prefix (#[allow(dead_code)], tests only) and no
change to the OpenAI gateway slug preservation logic (used by OpenRouter
and similar gateways with non-default OPENAI_BASE_URL) — that behaviour for
"openai/" stays exactly as upstream.
np6126
pushed a commit
to np6126/tank-agent-os
that referenced
this pull request
May 17, 2026
…am-drop The two patches we ship in bootc/patches/ each contained an experimental mix of changes that aren't all needed in this setup. Splitting them into intent-aligned files makes it possible to drop them individually as upstream merges happen, without having to surgically extract pieces. - claw-fix-stream-newlines.patch: now contains only the trailing-newline restoration in MarkdownStreamState::push. Matches the change in ultraworkers/claw-code#3043 1:1, so the file drops out the moment that PR lands and CLAW_CODE_REF is bumped. - claw-fix-openai-prefix-strip.patch: now contains only the "local/" routing-prefix additions in metadata_for_model and wire_model_for_base_url. The dead-code edit to strip_routing_prefix and the OpenRouter slug-preservation removal are gone — both were experimental and unneeded for this setup (we use "local/" prefix, not "openai/"). Matches ultraworkers/claw-code#3044 1:1, drops out when that PR lands. - claw-add-think-block-filter.patch (new): isolates the <think>...</think> block filtering for thinking models (Qwen3 et al). Applied on top of the newlines patch since both touch MarkdownStreamState::push. Stays as a local patch until ultraworkers/claw-code#3045 resolves; the filter remains usable as a standalone patch against post-#3043 upstream because its context lines reference the newlines-applied state. Containerfile applies the three patches in order: newlines → think → local-prefix. Verified: all three apply cleanly to the pinned CLAW_CODE_REF and the resulting tree compiles (cargo check --workspace).
|
Model id formats like qwen/qwen3:14b silently misroute to DashScope when the user is running a local Ollama server. No error — wrong destination. Adds a new prefix convention (local/) that users must learn. If OPENAI_BASE_URL is not set, local/ routes to OpenAI with a stripped model id, which will likely 404. Explicit prefix over inference. Users targeting local servers now carry that intent in the model string rather than relying on routing heuristics to guess correctly. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A common deployment is running an OpenAI-compatible inference server locally — Ollama, LM Studio, vLLM, llama.cpp — and routing claw-code's OpenAI provider client at it via
OPENAI_BASE_URL.The obvious model id formats actively misroute today:
qwen/qwen3:14borqwen3-coder→metadata_for_modelmaps the qwen family to DashScope (DASHSCOPE_API_KEY,dashscope.aliyuncs.com). A user serving Qwen3 locally on Ollama is silently routed to Alibaba's cloud.kimi/kimi-k2.5→ same DashScope routing.grok-*→ routes to xAI.The existing
openai/prefix would work as a workaround, but mentally conflates "this is my local Qwen3" with "the OpenAI API itself", and a user who does want the existing OpenRouter behaviour (slug preserved on the wire foropenai/...) can't have it both ways.This PR adds
local/as an explicit routing prefix that says "this is a local OpenAI-compatible server, route as OpenAI client, strip the prefix on the wire". Routing semantics are identical toopenai/:OpenAiprovider,OPENAI_API_KEY(typically an unused placeholder for local servers),OPENAI_BASE_URL.Diff
Two places only — both pure additions.
No change to
strip_routing_prefix— it's#[allow(dead_code)]and only referenced from its own unit tests; addinglocalthere would be cosmetic.No change to the OpenAI gateway slug preservation logic used by OpenRouter and similar gateways with a non-default
OPENAI_BASE_URL—openai/...behaviour stays exactly as upstream.Example
Wire request:
POST $OPENAI_BASE_URL/chat/completionswithmodel=qwen3:14b. No misrouting to DashScope.Test plan
cargo check --workspacecleancargo test -p apilocal/qwen3:14bagainst a local Ollama, verify the request goes to the local server with bare model id on the wireCompanion discussion
A separate issue (#3045) covers two related open questions:
<think>...</think>block filtering in streamed output, and whether theopenai/-slug preservation logic should remain non-configurable oncelocal/is available. Posted separately because both involve tradeoffs upstream may want to weigh in on before code lands.