Skip to content

alifeinbinary/penny

 
 

Repository files navigation

Penny

Penny

Your private, personalized internet companion.

Author: Jared Lockhart

CI Python 3.14+ License: MIT OpenAI-compatible LLM Signal Discord Firefox

Penny screenshot 1 Penny screenshot 2 Penny screenshot 3

Overview

Ask Penny anything and she'll search the web and text you back, always with sources.

But she's not just a question-answering bot. She pays attention. She remembers your conversations, learns what you're into, and starts sharing things she thinks you'd like on her own. She follows up on old topics when she finds something new. She gets to know you over time and her responses get more personal because of it.

Penny communicates via Signal, Discord, or a Firefox browser extension — all channels share the same conversation history. The browser extension gives her direct access to the web: she can browse pages with the full rendering engine and your session, see what you're currently looking at, and present her discoveries as a browsable feed of thought cards.

Penny is a feed only for you. Private, personal, and local.

How It Works

Conversations

When you send Penny a message, she always searches the web before responding — she never makes things up from model knowledge. A local LLM reads the search results and writes a response in her own voice: casual, calm, with sources. Penny uses the OpenAI Python SDK against any OpenAI-compatible endpoint, so you can run her against Ollama, omlx, the OpenAI cloud, vLLM, or anything else that speaks the protocol.

Penny talks to you over Signal, Discord, or a Firefox sidebar extension — the same apps you already use. All channels share conversation history: ask on Signal, follow up in the browser. Quote-reply to continue a thread; she'll walk the conversation history for context.

Preferences

Penny builds a model of what you care about. The HistoryAgent runs continuously in the background, scanning unprocessed messages and reactions and extracting preference topics in a two-pass LLM pipeline — first identify candidate topics, then classify each as positive or negative. New topics are deduplicated against existing entries via token containment ratio + embedding similarity, and only become "thinking candidates" once they cross a mention-count threshold (so a one-off comment doesn't drive autonomous research).

You can also manage preferences directly: /like dark roast coffee, /dislike cold weather, /unlike, /undislike. These drive what Penny thinks about and what she shares with you.

Thinking

When Penny is idle, she thinks. The ThinkingAgent picks a random topic from your positive preferences, searches the web, and has an inner monologue — reasoning through what she finds. The result is stored as a thought, linked back to the preference that seeded it.

Thoughts bleed into chat context, so Penny has continuity of her own reasoning. When she finds something interesting, the NotifyAgent shares it with you — scoring candidates by novelty (avoiding repeats) and sentiment (preference alignment), with exponential backoff so she doesn't overwhelm you.

Memory

Penny's memory has three layers, all assembled into chat context on every message:

  • Knowledge — when Penny browses a page (search results, article reads), the HistoryAgent summarizes the page into a prose paragraph and stores it with an embedding, keyed by URL. On the next chat turn, the most semantically relevant entries are pulled into context, scored as max(weighted_decay_against_conversation, current_message_cosine) with an absolute floor that suppresses noise on greetings and uncovered topics.
  • Related past messages — the embedding of every outgoing/incoming message is cached. When you ask something, prior messages are scored by cosine_to_current_message − α × centrality (centrality = mean cosine to the rest of the corpus, suppressing centroid-magnet boilerplate), then expanded to ±5-minute neighbors so conversational follow-ups travel together.
  • Preferences — the HistoryAgent also extracts likes/dislikes from your text messages and emoji reactions in two passes (identify topics, then classify valence), deduplicated against existing entries via TCR + embedding similarity.

Commands

Beyond regular conversation, Penny supports slash commands:

  • /commands — list every command available in this deployment
  • /profile — set your name, location, and timezone (required before chat)
  • /like, /dislike — view or add preferences
  • /unlike, /undislike — remove preferences
  • /schedule — set up recurring tasks (e.g., /schedule daily 9am weather forecast); uses LLM to parse natural-language timing
  • /unschedule — list and delete scheduled tasks
  • /config — view or tune runtime parameters (30+ values: scheduling intervals, notification backoff, dedup thresholds, email pagination limits, etc.)
  • /debug — show agent status, git commit, system info, background task state
  • /mute, /unmute — silence or resume autonomous notifications
  • /draw — generate images via a local model (requires LLM_IMAGE_MODEL)
  • /email — search your Fastmail inbox via JMAP (requires FASTMAIL_API_TOKEN)
  • /zoho — search your Zoho Mail inbox (requires ZOHO_API_ID/ZOHO_API_SECRET/ZOHO_REFRESH_TOKEN)
  • /bug, /feature — file GitHub issues (requires GitHub App credentials)
  • /test — enter isolated test mode (separate DB, fresh agents) for development

Penny's Mind

How information flows through Penny's cognitive systems — from perception to memory to thought to action.

flowchart TB
    User((User)) -->|message| Chat
    User -->|reaction| Memory

    subgraph Conversation["🗣 Conversation"]
        Chat[ChatAgent<br>search web + respond]
    end

    Memory -.->|"profile · knowledge · related msgs<br>thoughts · dislikes"| Chat
    Chat -->|reply| User
    Chat -->|log| Memory

    subgraph Memory["🧠 Memory"]
        Messages["Messages<br>(embedded)"]
        Knowledge["Knowledge<br>(page summaries)"]
        Preferences["Preferences<br>likes · dislikes"]
        Thoughts[Thoughts]
    end

    subgraph Background["💭 Background — when idle"]
        History["HistoryAgent<br>extract knowledge from<br>browses, preferences<br>from messages"]
        Thinking["ThinkingAgent<br>pick a preference,<br>research it, store thought"]
        Notify["NotifyAgent<br>score thoughts,<br>share the best one"]
    end

    Messages --> History
    History --> Knowledge
    History --> Preferences
    Preferences --> Thinking
    Thinking --> Thoughts
    Thoughts --> Notify
    Notify -->|proactive message| User

    style Conversation fill:#e8f5e9,stroke:#2e7d32
    style Memory fill:#e3f2fd,stroke:#1565c0
    style Background fill:#fff3e0,stroke:#e65100
Loading

The Cognitive Cycle

  1. Conversation — user sends a message, ChatAgent searches the web and responds. Context is assembled from memory: profile, related knowledge (semantically-relevant page summaries), related past messages (embedding similarity with centrality penalty + ±5-minute neighbor expansion), recent thoughts, and topics to avoid
  2. Digestion — when idle, HistoryAgent summarizes browsed pages into knowledge entries (one per URL, deduplicated and embedded), and extracts preferences (likes/dislikes) from text messages and emoji reactions in a two-pass LLM pipeline
  3. Reflection — ThinkingAgent picks a random positive preference, searches the web, and reasons through what it finds. The result is stored as a thought
  4. Initiative — NotifyAgent scores un-notified thoughts by novelty (avoiding repeats) and preference alignment, composes a message, and sends it with exponential backoff
  5. Repeat — the user's reaction feeds back into conversation, digestion, and reflection

Models

Penny uses up to four LLM model roles, all running locally by default:

Role Env Purpose Required?
Text LLM_MODEL Single model for all agents — chat, thinking, history, notify, schedules Yes
Embedding LLM_EMBEDDING_MODEL Embeddings for knowledge retrieval, message similarity, and preference dedup Optional
Vision LLM_VISION_MODEL Image captioning when users send photos Optional
Image LLM_IMAGE_MODEL Image generation via /draw Optional

Text, vision, and embedding all go through the OpenAI SDK and can each point at a different OpenAI-compatible endpoint via the corresponding LLM_*_API_URL / LLM_*_API_KEY overrides — useful when running text on one machine and embeddings on another. Image generation is the one exception: it talks to Ollama's /api/generate endpoint directly (set LLM_IMAGE_API_URL), because there's no OpenAI-compatible image generation protocol that works with local models.

Scheduling

Background agents run in priority order when idle (default: 60s after last message): schedule executor (always) → history → notify → thinking. Agents with no work are skipped. Foreground messages cancel the active background task immediately.

User-created scheduled tasks (via /schedule) run on their own timer regardless of idle state, so a daily weather briefing won't be blocked by an active conversation.

Runtime Configuration

30+ parameters are tunable at runtime via /config — scheduling intervals, notification backoff, preference dedup thresholds, inner monologue settings, email pagination limits, and more. Values follow a three-tier lookup: database override → environment variable → default. Changes take effect immediately without restart.

Browser Extension

The Firefox extension adds a visual, interactive layer on top of Penny's existing architecture:

  • Sidebar chat — same conversation as Signal/Discord, with HTML-formatted responses, images, clickable links, and live in-flight tool status (e.g., "Searching…", "Reading example.com…")
  • Active tab context — Penny can see the page you're currently viewing (via Defuddle content extraction). Toggle "Include page content" to ask questions about any page
  • Browser toolsbrowse_url opens pages in hidden tabs with the full web engine and your session. Per-addon "tool use" toggle controls whether each browser participates in tool dispatch
  • Domain permissions — first-time access to a new domain triggers an approve/deny prompt. Approvals persist server-side and sync across all connected addons; prompts can also be answered from Signal so you don't need a browser open. /config DOMAIN_PERMISSION_MODE allow_all skips prompting entirely
  • Thoughts feed — a browsable card grid of Penny's discoveries, with images, seed-topic bylines, and a modal viewer. Thumbs up/down reactions feed directly into the preference extraction pipeline. Browser tabs receive unread thought counts as a badge
  • Schedule manager — UI for creating, editing, and deleting /schedule cron tasks without touching the chat
  • Settings panel — domain allowlist, runtime config params (the same 30+ values /config exposes), and addon-level toggles
  • Prompt log viewer — every LLM call Penny makes is browseable from the extension, grouped by run ID with input messages, response, thinking field, and outcome badge. Useful for debugging "why did Penny say that"
  • Multi-device — each browser registers as a device (e.g., "firefox macbook 16"). All devices share the same user identity and conversation history. In-flight progress reactions on Signal also surface on the user's message via emoji morphing (💭 → 🔍 → 📖 → cleared on completion)
cd browser
npm install
npm run dev    # Build, watch, and launch Firefox with auto-reload

See docs/browser-extension-architecture.md for the full architecture and security model.

Setup & Running

Prerequisites

  1. For Signal: signal-cli-rest-api running on host (port 8080)
  2. For Discord: Discord bot token and channel ID
  3. An OpenAI-compatible LLM endpoint running on host or reachable from the container. Set LLM_API_URL to point at it. Common choices: Ollama, omlx, vLLM, the OpenAI cloud
  4. Browser extension loaded in Firefox (for web search, page reading, and the visual UI)
  5. Docker & Docker Compose installed

Quick Start

# 1. Create .env file with your configuration
cp .env.example .env
# Edit .env with your settings (Signal or Discord credentials)

# 2. Start the agent
make up

Make Commands

make up               # Build and start all services (foreground)
make prod             # Deploy penny only (no team, no override)
make kill             # Tear down containers and remove local images
make build            # Build the penny Docker image
make team-build       # Build the penny-team Docker image
make browser-build    # Bundle the browser extension content script
make check            # Format check, lint, typecheck, migrate-validate, pytest (penny + team), tsc (browser)
make pytest           # Run integration tests (penny + team)
make fix              # Format + autofix lint issues (penny + team)
make typecheck        # Type check with ty (penny + team)
make token            # Generate GitHub App installation token for gh CLI
make signal-avatar    # Set Penny's Signal profile picture
make migrate-test     # Test database migrations against a copy of prod DB
make migrate-validate # Check for duplicate migration number prefixes

All dev tool commands run in temporary Docker containers via docker compose run --rm, with source volume-mounted so changes write back to the host filesystem.

Configuration

Configuration is managed via a .env file in the project root:

# .env

# Channel type (optional — auto-detected from credentials)
# CHANNEL_TYPE="signal"  # or "discord"

# Signal Configuration (required for Signal)
SIGNAL_NUMBER="+1234567890"
SIGNAL_API_URL="http://localhost:8080"

# Discord Configuration (required for Discord)
DISCORD_BOT_TOKEN="your-bot-token"
DISCORD_CHANNEL_ID="your-channel-id"

# Browser Extension (optional)
BROWSER_ENABLED=true
BROWSER_HOST="0.0.0.0"                    # Use 0.0.0.0 in Docker
BROWSER_PORT=9090

# LLM Configuration — any OpenAI-compatible endpoint (Ollama, omlx, vLLM,
# OpenAI cloud, ...). The example URL points at a local Ollama instance.
LLM_API_URL="http://host.docker.internal:11434/v1"
LLM_MODEL="gpt-oss:20b"                   # Single model for all agents
# LLM_API_KEY="not-needed"                # Default fine for unauthenticated local backends
# LLM_VISION_MODEL="qwen3-vl"             # Optional, enables vision/image messages
# LLM_EMBEDDING_MODEL="embeddinggemma"    # Optional, enables preference/knowledge embeddings
# LLM_IMAGE_MODEL="x/z-image-turbo"       # Optional, enables /draw (uses LLM_IMAGE_API_URL)
# LLM_IMAGE_API_URL="http://host.docker.internal:11434"  # Ollama REST for /draw

# Database & Logging
DB_PATH="/penny/data/penny/penny.db"
LOG_LEVEL="INFO"
# LOG_FILE="/penny/data/penny/logs/penny.log"

# Fastmail JMAP (optional, enables /email)
# FASTMAIL_API_TOKEN="your-api-token"

# Zoho Mail (optional, enables /zoho)
# ZOHO_API_ID="..."
# ZOHO_API_SECRET="..."
# ZOHO_REFRESH_TOKEN="..."

# GitHub App (optional, enables /bug, /feature, and agent containers)
# GITHUB_APP_ID="12345"
# GITHUB_APP_PRIVATE_KEY_PATH="data/private/github-app.pem"
# GITHUB_APP_INSTALLATION_ID="67890"

# Penny-team agent containers (optional, leave blank to disable)
# CLAUDE_CODE_OAUTH_TOKEN="..."           # From `claude setup-token` (Max plan)
# OLLAMA_BACKGROUND_MODEL="..."           # Optional, enables team Quality agent

Channel Selection

Penny auto-detects which channel to use based on configured credentials:

  • If DISCORD_BOT_TOKEN and DISCORD_CHANNEL_ID are set (and Signal is not), uses Discord
  • If SIGNAL_NUMBER is set, uses Signal
  • Set CHANNEL_TYPE explicitly to override auto-detection

Configuration Reference

LLM — Penny talks to any OpenAI-compatible endpoint via the OpenAI Python SDK. There are no Ollama-specific dependencies in the runtime.

  • LLM_API_URL: API endpoint (default: http://host.docker.internal:11434)
  • LLM_MODEL: Single text model for all agents (default: gpt-oss:20b)
  • LLM_API_KEY: API key (default: "not-needed", fine for unauthenticated local backends)
  • LLM_VISION_MODEL: Vision model for image understanding (e.g., qwen3-vl). Optional; enables image messages
  • LLM_VISION_API_URL / LLM_VISION_API_KEY: Override API URL/key for the vision model (e.g., to run vision on a different host)
  • LLM_EMBEDDING_MODEL: Dedicated embedding model (e.g., embeddinggemma). Optional; enables preference/knowledge/message embeddings
  • LLM_EMBEDDING_API_URL / LLM_EMBEDDING_API_KEY: Override API URL/key for the embedding model
  • LLM_IMAGE_MODEL: Image generation model (e.g., x/z-image-turbo). Optional; enables /draw. Image generation is the one non-OpenAI endpoint — it talks to Ollama's /api/generate directly
  • LLM_IMAGE_API_URL: Ollama REST endpoint for image generation (default: http://host.docker.internal:11434)

API Keys:

  • FASTMAIL_API_TOKEN: enables /email
  • ZOHO_API_ID, ZOHO_API_SECRET, ZOHO_REFRESH_TOKEN: enables /zoho (obtain via Zoho's OAuth flow)

GitHub App (optional, enables /bug and /feature; required for agent containers):

  • GITHUB_APP_ID, GITHUB_APP_PRIVATE_KEY_PATH, GITHUB_APP_INSTALLATION_ID

Browser Extension (optional):

  • BROWSER_ENABLED: true to start the WebSocket server (default: false)
  • BROWSER_HOST: bind address (default: localhost; use 0.0.0.0 in Docker)
  • BROWSER_PORT: WebSocket port (default: 9090)

Behavior:

  • TOOL_TIMEOUT: Tool execution timeout in seconds (default: 120)
  • MESSAGE_MAX_STEPS / IDLE_SECONDS: also accepted as env vars, but these are runtime-configurable via /config so DB overrides win
  • 30+ parameters are runtime-configurable via /config — scheduling intervals, notification cooldowns/candidates, preference dedup thresholds, history context limits, email body/search/list pagination limits, related-message retrieval thresholds, and more

Logging:

  • LOG_LEVEL: DEBUG, INFO, WARNING, ERROR (default: INFO)
  • LOG_FILE: Optional path to log file
  • LOG_MAX_BYTES: Maximum log file size before rotation (default: 10 MB)
  • LOG_BACKUP_COUNT: Number of rotated backup files to keep (default: 5)
  • DB_PATH: SQLite database location (default: /penny/data/penny/penny.db)

Discord Setup

  1. Create a Discord application at https://discord.com/developers/applications
  2. Create a bot for the application and copy the token
  3. Enable these intents in the Bot settings:
    • Message Content Intent
    • Server Members Intent (optional)
  4. Invite the bot to your server with the OAuth2 URL Generator:
    • Scopes: bot
    • Permissions: Send Messages, Read Message History
  5. Get the channel ID (enable Developer Mode in Discord settings, right-click channel → Copy ID)
  6. Add to your .env:
    DISCORD_BOT_TOKEN="your-token"
    DISCORD_CHANNEL_ID="your-channel-id"

Testing & CI

Penny includes end-to-end integration tests that mock all external services:

make pytest      # Run all tests
make check       # Run format, lint, typecheck, and tests

CI runs make check in Docker on every push to main and on pull requests via GitHub Actions.

Tests cover the full message flow (search, response, threading, typing indicators), all background agents (history, thinking, notify, scheduler coordination), every slash command, vision processing, and tool edge cases. External services are replaced with mock servers and SDK patches — a mock Signal WebSocket server and a mock LLM client (MockLlmClient, patches openai.AsyncOpenAI) with configurable responses.

Agent Orchestrator

Penny includes a Python-based agent orchestrator that manages autonomous Claude CLI agents. Agents process work from GitHub Issues on a schedule, using labels as a state machine:

backlog → requirements → specification → in-progress → in-review → closed   (features)
bug → in-review → closed                                                     (bug fixes)

Agents:

  • Product Manager: Gathers requirements for requirements issues
  • Architect: Writes detailed specs for specification issues, handles spec feedback
  • Worker: Implements in-progress issues — creates branches, writes code/tests, runs make check, opens PRs; addresses PR feedback on in-review issues; fixes bug issues directly
  • Monitor: Watches production logs for errors, deduplicates against existing issues, and files bug issues automatically
  • Quality: Evaluates Penny's response quality via a local LLM, files bug issues for low-quality output (optional, requires OLLAMA_BACKGROUND_MODEL)

Each agent checks for matching GitHub issue labels before waking Claude CLI, so idle cycles cost ~1 second instead of a full Claude invocation.

make up          # Run orchestrator with full stack

License

MIT

About

A Personal Agent in Python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 90.5%
  • TypeScript 6.5%
  • CSS 1.8%
  • HTML 0.5%
  • Makefile 0.2%
  • Dockerfile 0.2%
  • Other 0.3%