ChatJimmy Completions API Proxy

An OpenAI-compatible proxy that let's you use ChatJimmy's hardware-accelerated Llama 3.1 8B, powered by Taalas' custom silicon running at ~17K tokens/sec.

Usage

# Start the proxy
python proxy.py

# Enable file logging
python proxy.py --log

# Custom port and log file
python proxy.py --port 4100 --log --log-file custom.log

API

The proxy exposes standard OpenAI-compatible endpoints:

`GET /v1/models`

Returns available models.

`POST /v1/chat/completions`

Then point OpenCode (or any OpenAI-compatible client) at http://localhost:4100/v1.

OpenCode setup

An opencode.json config is included in this directory. Copy it to your project root to use ChatJimmy as a provider in OpenCode:

cp opencode.json /path/to/your/project/opencode.json

Request body:

Field	Type	Default	Description
`model`	string	`llama3.1-8B`	Model ID
`messages`	array	required	Array of `{role, content}` messages (`system`, `user`, `assistant`, `tool`)
`stream`	boolean	`false`	Enable SSE streaming
`tools`	array	`[]`	OpenAI-format tool/function definitions (filtered, see below)
`tool_choice`	string \| object	`"auto"`	`"auto"`, `"none"`, `"required"`, or `{"type": "function", "function": {"name": "..."}}`

Response format (non-streaming):

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "llama3.1-8B",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "...",
      "tool_calls": [{"id": "call_...", "type": "function", "function": {"name": "...", "arguments": "..."}}]
    },
    "finish_reason": "stop" | "tool_calls"
  }],
  "usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0}
}

When stream: true, the proxy returns SSE chunks in the standard chat.completion.chunk format.

Tool filtering

The proxy filters out extraneous tools from OpenCode before sending them to the model. Fewer tools means less prompt bloat and better tool-calling accuracy. The following tools are stripped from incoming requests:

webfetch, todowrite, skill, question, task

These are high-level orchestration tools that a small model struggles to use correctly, and removing them keeps the model focused on the core tools it can actually handle (file I/O, shell commands, search, etc.).

Limitations

Model: Only Llama 3.1 8B is available (aggressively quantized 3-bit/6-bit, so quality is below GPU baselines)
System prompt size: ChatJimmy silently returns empty responses when the system prompt exceeds ~30K characters; the proxy truncates at 28K as a safeguard
Tool calling: Emulated via <tool_call> XML tags in the prompt rather than native function calling, since the underlying model doesn't support it natively
No authentication: ChatJimmy's API is currently open beta with no API key required
No streaming from upstream: The proxy buffers the full ChatJimmy response before streaming it back to the client, so time-to-first-token equals full generation time

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
README.md		README.md
opencode.jsonc		opencode.jsonc
proxy.py		proxy.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChatJimmy Completions API Proxy

Usage

API

`GET /v1/models`

`POST /v1/chat/completions`

OpenCode setup

Tool filtering

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

ChatJimmy Completions API Proxy

Usage

API

GET /v1/models

POST /v1/chat/completions

OpenCode setup

Tool filtering

Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

`GET /v1/models`

`POST /v1/chat/completions`

Packages