Skip to content

Fadeleke57/jimmy-proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ChatJimmy Completions API Proxy

ChatJimmy

An OpenAI-compatible proxy that let's you use ChatJimmy's hardware-accelerated Llama 3.1 8B, powered by Taalas' custom silicon running at ~17K tokens/sec.

Usage

# Start the proxy
python proxy.py

# Enable file logging
python proxy.py --log

# Custom port and log file
python proxy.py --port 4100 --log --log-file custom.log

API

The proxy exposes standard OpenAI-compatible endpoints:

GET /v1/models

Returns available models.

POST /v1/chat/completions

Then point OpenCode (or any OpenAI-compatible client) at http://localhost:4100/v1.

OpenCode setup

An opencode.json config is included in this directory. Copy it to your project root to use ChatJimmy as a provider in OpenCode:

cp opencode.json /path/to/your/project/opencode.json

Request body:

Field Type Default Description
model string llama3.1-8B Model ID
messages array required Array of {role, content} messages (system, user, assistant, tool)
stream boolean false Enable SSE streaming
tools array [] OpenAI-format tool/function definitions (filtered, see below)
tool_choice string | object "auto" "auto", "none", "required", or {"type": "function", "function": {"name": "..."}}

Response format (non-streaming):

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "llama3.1-8B",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "...",
      "tool_calls": [{"id": "call_...", "type": "function", "function": {"name": "...", "arguments": "..."}}]
    },
    "finish_reason": "stop" | "tool_calls"
  }],
  "usage": {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0}
}

When stream: true, the proxy returns SSE chunks in the standard chat.completion.chunk format.

Tool filtering

The proxy filters out extraneous tools from OpenCode before sending them to the model. Fewer tools means less prompt bloat and better tool-calling accuracy. The following tools are stripped from incoming requests:

webfetch, todowrite, skill, question, task

These are high-level orchestration tools that a small model struggles to use correctly, and removing them keeps the model focused on the core tools it can actually handle (file I/O, shell commands, search, etc.).

Limitations

  • Model: Only Llama 3.1 8B is available (aggressively quantized 3-bit/6-bit, so quality is below GPU baselines)
  • System prompt size: ChatJimmy silently returns empty responses when the system prompt exceeds ~30K characters; the proxy truncates at 28K as a safeguard
  • Tool calling: Emulated via <tool_call> XML tags in the prompt rather than native function calling, since the underlying model doesn't support it natively
  • No authentication: ChatJimmy's API is currently open beta with no API key required
  • No streaming from upstream: The proxy buffers the full ChatJimmy response before streaming it back to the client, so time-to-first-token equals full generation time

About

An OpenAI-compatible proxy to use ChatJimmy as a completions endpoint

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages