An OpenAI-compatible proxy that let's you use ChatJimmy's hardware-accelerated Llama 3.1 8B, powered by Taalas' custom silicon running at ~17K tokens/sec.
# Start the proxy
python proxy.py
# Enable file logging
python proxy.py --log
# Custom port and log file
python proxy.py --port 4100 --log --log-file custom.logThe proxy exposes standard OpenAI-compatible endpoints:
Returns available models.
Then point OpenCode (or any OpenAI-compatible client) at http://localhost:4100/v1.
An opencode.json config is included in this directory. Copy it to your project root to use ChatJimmy as a provider in OpenCode:
cp opencode.json /path/to/your/project/opencode.jsonRequest body:
| Field | Type | Default | Description |
|---|---|---|---|
model |
string | llama3.1-8B |
Model ID |
messages |
array | required | Array of {role, content} messages (system, user, assistant, tool) |
stream |
boolean | false |
Enable SSE streaming |
tools |
array | [] |
OpenAI-format tool/function definitions (filtered, see below) |
tool_choice |
string | object | "auto" |
"auto", "none", "required", or {"type": "function", "function": {"name": "..."}} |
Response format (non-streaming):
{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "llama3.1-8B",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "...",
"tool_calls": [{"id": "call_...", "type": "function", "function": {"name": "...", "arguments": "..."}}]
},
"finish_reason": "stop" | "tool_calls"
}],
"usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0}
}When stream: true, the proxy returns SSE chunks in the standard chat.completion.chunk format.
The proxy filters out extraneous tools from OpenCode before sending them to the model. Fewer tools means less prompt bloat and better tool-calling accuracy. The following tools are stripped from incoming requests:
webfetch, todowrite, skill, question, task
These are high-level orchestration tools that a small model struggles to use correctly, and removing them keeps the model focused on the core tools it can actually handle (file I/O, shell commands, search, etc.).
- Model: Only Llama 3.1 8B is available (aggressively quantized 3-bit/6-bit, so quality is below GPU baselines)
- System prompt size: ChatJimmy silently returns empty responses when the system prompt exceeds ~30K characters; the proxy truncates at 28K as a safeguard
- Tool calling: Emulated via
<tool_call>XML tags in the prompt rather than native function calling, since the underlying model doesn't support it natively - No authentication: ChatJimmy's API is currently open beta with no API key required
- No streaming from upstream: The proxy buffers the full ChatJimmy response before streaming it back to the client, so time-to-first-token equals full generation time
