🏗️ Harness Engineering

The universal template for multi-agent AI coding swarms

Fork this repo. Point any AI agent at it. Run for months.

No context loss. No agent collisions. No wasted tokens.

🚀 Quick Start · 📖 How It Works · 🛠️ Skills · 🤖 Providers · 📚 References

💡 What Is This?

A harness is the tool shell that lets an AI agent affect the real world. If the model is the brain, the harness is the hands and feet.

Harness Engineering is a production-ready, provider-agnostic template that coordinates multiple AI coding agents across sessions, context windows, and providers — without losing state or duplicating work.

🧠 "The best way to use AI agents isn't to make them smarter — it's to give them better tools, clearer specs, and a shared memory."

🎯 Three problems, three solutions

😰 Problem	✅ Solution
Agents lose context between sessions	📝 `claude-progress.txt` — append-only handoff log every agent reads
Multiple agents collide on the same code	🤝 `agents.json` — live swarm manifest with heartbeat & collision avoidance
Expensive models waste tokens on cheap tasks	💰 T1/T2/T3 cost tiers — Opus for architecture, Haiku for linting

🚀 Quick Start

# 1. Clone the template
git clone https://github.com/ArtemisAI/Harness_Engineering.git my-project
cd my-project

# 2. Bootstrap the environment
bash init.sh

# 3. Start your first session
# Claude Code:
/session-start

# Every other provider:
@session-start

The Initializer agent runs once, adapts features.json to your project, and scaffolds the spec pipeline. From that point, every session starts with @session-start and ends with @session-end.

📖 How It Works

🏛️ Architecture

graph TB
    subgraph "🧠 Cognitive Layer"
        A["📄 AGENTS.md<br/><i>Universal spec — all providers</i>"]
        S["📂 skills/<br/><i>10 reusable skill files</i>"]
        C["📜 constitution.md<br/><i>Project values & invariants</i>"]
    end

    subgraph "👥 Agent Swarm"
        O["🎯 Orchestrator<br/><b>T1 Heavy</b><br/>Assigns work, detects stalls"]
        CD["💻 Coding Agent(s)<br/><b>T2 Standard</b><br/>One feature per session"]
        E["🧹 Entropy Agent<br/><b>T3 Fast</b><br/>Weekly cleanup scan"]
    end

    subgraph "💾 State Layer"
        F["features.json"]
        AG["agents.json"]
        I["issues.json"]
        P["claude-progress.txt"]
    end

    subgraph "🔧 Execution Layer (MCP)"
        FS["filesystem"]
        G["git"]
        GH["github"]
        M["memory"]
        PW["playwright"]
        DW["deepwiki"]
        C7["context7"]
    end

    A --> O
    A --> CD
    A --> E
    S --> CD
    O --> F
    O --> AG
    CD --> F
    CD --> I
    CD --> P
    E --> I
    CD --> FS
    CD --> G
    CD --> GH
    CD --> DW
    CD --> C7

    style A fill:#4CAF50,color:#fff,stroke:#388E3C
    style O fill:#FF9800,color:#fff,stroke:#F57C00
    style CD fill:#2196F3,color:#fff,stroke:#1976D2
    style E fill:#9C27B0,color:#fff,stroke:#7B1FA2

🔄 Session Lifecycle

sequenceDiagram
    participant H as 👤 Human
    participant A as 🤖 Agent
    participant S as 💾 State Files
    participant G as 🔧 Git

    H->>A: Start session
    A->>A: bash init.sh
    A->>S: Read features.json → pick highest-priority failing feature
    A->>S: Read agents.json → register self, claim feature
    A->>G: Commit registration

    rect rgb(230, 245, 255)
        Note over A,G: 💻 Implementation Phase
        A->>A: Read spec → implement → test
        A->>S: Update heartbeat every 30 min
    end

    A->>A: Run test_command ✅
    A->>S: Update features.json → "passing"
    A->>S: Write verification.md
    A->>S: Append to claude-progress.txt
    A->>S: Update agents.json → "done"
    A->>G: Commit all changes
    A->>H: Session complete — clean handoff

📋 Feature Flow

Every feature follows the spec-driven pipeline — no shortcuts:

📝 Requirements  →  📐 Design  →  ✅ Tasks  →  💻 Implementation  →  🧪 Verification
     ↓                  ↓             ↓               ↓                    ↓
requirements.md    design.md     tasks.md      src/ code          verification.md

Each stage produces a file under docs/features/<id>/. Templates are in docs/features/TEMPLATE/.

🛠️ Skills

All skills live in skills/ with YAML frontmatter for auto-discovery. Invoke by name:

Skill	Purpose	Tier	Invoke
🟢 `session-start`	Orient session, register in `agents.json`	T2	`@session-start`
🔴 `session-end`	Test → verify → commit → handoff	T2	`@session-end`
⚙️ `implement-feature`	Spec-driven implementation	T2	`@implement-feature`
🆕 `new-feature`	Scaffold feature entry + spec files	T2	`@new-feature`
🧪 `run-tests`	Narrowest-scope test run (3-attempt limit)	T2/T3	`@run-tests`
🐛 `create-issue`	Log to `issues.json` + GitHub Issue	T3	`@create-issue`
🔀 `delegate-subagent`	Spawn subagent in isolated worktree	T1	`@delegate-subagent`
🧹 `entropy-check`	Find drift, dead code, broken refs	T3	`@entropy-check`
📝 `write-adr`	Record architecture decision	T1	`@write-adr`
🎯 `assign-agent`	Orchestrator only — route from backlog	T1	`@assign-agent`

💡 Claude Code uses /skill-name syntax. All other providers use @skill-name.

🤖 Provider Support

Works out-of-the-box with 14+ AI coding agents. All config files are pre-built and tested.

Provider	Status	Instructions	MCP Config
Claude Code	✅	`CLAUDE.md`	`.mcp.json`
GitHub Copilot	✅	`.github/copilot-instructions.md`	`.vscode/mcp.json`
Gemini CLI	✅	`.gemini/GEMINI.md`	`.gemini/settings.json`
Cursor	✅	`.cursor/rules/harness.mdc`	`.cursor/mcp.json`
OpenAI Codex	✅	`AGENTS.md`	—
Roo Code	✅	`.roo/rules/harness.md`	`.roo/mcp.json`
Cline	✅	`.clinerules`	VS Code globalStorage
Windsurf	✅	`.windsurf/rules/harness.md`	Windsurf config
Antigravity	✅	`AGENTS.md`	via AGENTS.md
OpenCode	✅	`opencode.json`	`opencode.json`
Aider	✅	`.aider.conf.yml`	—
Goose	✅	`.goosehints`	Goose config
Zed	✅	`.rules`	`.zed/settings.json`
Warp	✅	`.warp/agent-instructions.md`	UI-managed

🌍 Universal rule: Instructions that all agents need → put in AGENTS.md. Provider-specific details → that provider's own config file only.

💰 Cost Tier System

The orchestrator routes every task to the cheapest model capable of doing it correctly.

graph LR
    subgraph "🔴 T1 — Heavy"
        H1["Claude Opus 4.6"]
        H2["Gemini 2.5 Pro"]
    end

    subgraph "🔵 T2 — Standard"
        S1["Claude Sonnet 4.6"]
        S2["GPT-4o"]
        S3["Gemini 2.5 Flash"]
    end

    subgraph "🟢 T3 — Fast"
        F1["Claude Haiku 4.5"]
        F2["GPT-4o-mini"]
        F3["Kimi 2.5 / Grok fast"]
    end

    H1 -.->|"Architecture<br/>ADRs<br/>Orchestration"| USE1["🏛️"]
    S1 -.->|"Implementation<br/>Code review<br/>Tests"| USE2["💻"]
    F1 -.->|"Linting<br/>Search<br/>Entropy"| USE3["⚡"]

    style H1 fill:#e74c3c,color:#fff
    style H2 fill:#e74c3c,color:#fff
    style S1 fill:#3498db,color:#fff
    style S2 fill:#3498db,color:#fff
    style S3 fill:#3498db,color:#fff
    style F1 fill:#2ecc71,color:#fff
    style F2 fill:#2ecc71,color:#fff
    style F3 fill:#2ecc71,color:#fff

Tier	When to use	Cost
🔴 T1 Heavy	Architecture decisions, ADRs, orchestration, complex debugging	$$$
🔵 T2 Standard	Feature implementation, code review, integration tests	$$
🟢 T3 Fast	Doc scraping, web search, linting, entropy scans, test runs	$

💡 T3 is 10–50× cheaper than T1. A documentation scrape doesn't need Opus.

📁 Project Structure

📦 Harness Engineering
├── 📄 AGENTS.md                    # 🌍 Universal spec (< 200 lines, CI enforced)
├── 📄 CLAUDE.md                    # Claude Code config (imports AGENTS.md)
├── 📄 llms.txt                     # Machine-readable file index
├── 📄 init.sh                      # 🚀 Session bootstrap script
│
├── 💾 State Files
│   ├── features.json               # Feature backlog (failing → in_progress → passing)
│   ├── agents.json                  # Live swarm manifest with heartbeat
│   ├── issues.json                  # Bug/blocker registry → synced to GitHub Issues
│   └── claude-progress.txt          # Append-only session handoff log
│
├── 🛠️ skills/                      # 10 universal skill definitions
│   ├── session-start.skill.md
│   ├── session-end.skill.md
│   ├── implement-feature.skill.md
│   └── ... (7 more)
│
├── 🤖 harness/                     # Agent system prompts
│   ├── initializer/prompt.md       # First-session scaffolder
│   ├── session/prompt.md           # Per-session coding agent
│   ├── entropy/prompt.md           # Weekly cleanup agent
│   └── orchestrator/
│       ├── prompt.md               # Swarm coordinator
│       └── assignments/            # Per-feature delegation briefs
│
├── 📚 docs/
│   ├── constitution.md             # Project identity & values
│   ├── governance.md               # Autonomy scope, OWASP top 10
│   ├── project-structure.md        # Scaffold rules, naming, secrets
│   ├── decisions/                  # Architecture Decision Records
│   │   ├── 001-provider-agnostic-harness.md
│   │   ├── 002-a2a-subagent-delegation.md
│   │   ├── 003-spec-kit-alignment.md
│   │   └── 004-cost-tiered-model-selection.md
│   └── features/                   # Spec pipeline per feature
│       ├── TEMPLATE/               # requirements.md, design.md, tasks.md, verification.md
│       ├── feat-001/
│       └── ...
│
├── 🧪 tests/                       # Test scripts per feature
├── 📋 schemas/                      # JSON Schema validation
├── 🔒 .claude/                      # Hooks: bash guard, lint check, progress check
│
├── 🔧 Provider Configs (14+)
│   ├── .mcp.json                   # Claude Code MCP
│   ├── .vscode/mcp.json            # VS Code / Copilot MCP
│   ├── .gemini/settings.json       # Gemini CLI
│   ├── .cursor/                    # Cursor rules + MCP
│   ├── .roo/                       # Roo Code rules + MCP
│   ├── .github/
│   │   ├── copilot-instructions.md
│   │   ├── agents/                 # Copilot Coding Agent persona
│   │   ├── agentic-workflows/      # Auto-triage, PR review, entropy
│   │   └── workflows/              # CI: checks, entropy, code review
│   └── ... (windsurf, warp, zed, opencode, aider, goose, etc.)
│
└── 🌐 .well-known/
    └── agent-card.json             # A2A v0.3 agent card

🔧 MCP Servers

Two layers work together — skills teach agents what to do, MCP servers give them tools to do it:

🧠 Agent prompt
  → loads skill: session-start.skill.md     ← cognitive layer (what to do)
  → calls MCP:   git_status                 ← execution layer (actually does it)
  → calls MCP:   read_file(features.json)
  → calls MCP:   bash(pytest ...)           ← via PTY sandbox
  → calls MCP:   write_file(progress.txt)

Server	Layer	Purpose
📁 `filesystem`	Execution	Read/write project files
🔀 `git`	Execution	Commits, diffs, log, branching
🐙 `github`	Execution	Issues, PRs, code search, CI status
🧠 `memory`	Execution	Cross-session knowledge graph
🎭 `playwright`	Execution	Browser automation, UI verification
🌐 `fetch`	Execution	Web content retrieval
🧩 `sequential-thinking`	Execution	Externalised multi-step planning
📖 `deepwiki`	Grounding	Q&A about any public GitHub repository
📚 `context7`	Grounding	Current library docs, indexed daily

⚠️ Always query deepwiki or context7 before implementing any library call. This eliminates API hallucination.

🛡️ Safety & Governance

Agent Boundaries (Always / Ask / Never)

	Rule
✅ Always	Run tests before committing · Update heartbeat · Open issues for out-of-scope work · Query grounding servers
⚠️ Ask First	Irreversible actions · New dependencies · 3 consecutive failures · Files outside `src/`, `tests/`, `docs/`
🚫 Never	WIP on `main` · Scope expansion · `rm`/`del`/`kill`/`sudo` · Upward layer imports · Secrets in git

CI Pipeline

Every push triggers automated checks:

Job	What it validates
🏗️ Structural integrity	Required files exist, JSON schemas valid, AGENTS.md < 200 lines
🔍 Secret scan	No API keys, tokens, private keys, or JWTs in committed files
📐 Layer order	`Types → Config → Repo → Service → Runtime → UI` — no upward imports
📋 Feature specs	Every feature has a spec dir; every passing feature has `verification.md`
📝 Progress log	`claude-progress.txt` is non-empty

Hooks (Claude Code)

Hook	Trigger	Action
🛑 `pre_bash_guard`	Before any shell command	Blocks `rm`, `del`, `kill`, `sudo`, `&&`, `
🔍 `post_lint_check`	After file edit/write	Runs language-appropriate linter
📝 `stop_progress_check`	Session end	Warns if `claude-progress.txt` not updated

🌐 Standards & Interoperability

Standard	Status	Details
AGENTS.md	✅ Compliant	Agentic AI Foundation / Linux Foundation
A2A Protocol	✅ v0.3	Agent card at `.well-known/agent-card.json`
MCP	✅ 9 servers	Anthropic Model Context Protocol
spec-kit	✅ Compatible	GitHub's spec-driven development (see ADR-003)
GitHub Agentic Workflows	✅ 3 workflows	Triage, PR review, entropy scan

🧭 Governing Principles

These apply to every agent on every provider. They are rules, not suggestions.

👁️ What the agent can't see doesn't exist — all decisions, specs, and plans live inside the repo as files
⚙️ Mechanical enforcement over documentation — CI breaks on violations; written guidelines alone erode
🎯 One feature at a time — every session targets exactly one features.json entry
🧹 Clean state between sessions — committed, logged, and ready for any agent to resume
🔧 Ask what capability is missing — when a task stalls, build the missing tool, then retry
👀 Give the agent eyes — Playwright for UI, logs for backend, screenshots for visual
🗺️ A map, not a manual — AGENTS.md is concise and points to deeper docs

📚 References

Resource	Description
📄 AGENTS.md Specification	Agentic AI Foundation / Linux Foundation standard
📊 How to Write a Great AGENTS.md	GitHub's analysis of 2,500+ repositories
🧠 Anthropic: Effective Harnesses	The initializer + coding agent pattern
🤖 OpenAI: Harness Engineering	Codex in an agent-first world
📝 Martin Fowler: Harness Engineering	Enterprise-scale agentic maintenance
🔗 A2A Protocol	Google's Agent-to-Agent interoperability standard
📐 GitHub spec-kit	Spec-Driven Development methodology
⭐ Awesome Claude Code	Skills, hooks, and plugins for Claude Code
⭐ Awesome Copilot	Custom agents and instructions for Copilot
⭐ Awesome Skills	954+ agentic skills for AI coding assistants

🏗️ Built for agents. Maintained by agents. Governed by humans.

Fork it. Adapt features.json. Ship features.

Made with ❤️ by ArtemisAI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏗️ Harness Engineering

The universal template for multi-agent AI coding swarms

💡 What Is This?

🎯 Three problems, three solutions

🚀 Quick Start

📖 How It Works

🏛️ Architecture

🔄 Session Lifecycle

📋 Feature Flow

🛠️ Skills

🤖 Provider Support

💰 Cost Tier System

📁 Project Structure

🔧 MCP Servers

🛡️ Safety & Governance

Agent Boundaries (Always / Ask / Never)

CI Pipeline

Hooks (Claude Code)

🌐 Standards & Interoperability

🧭 Governing Principles

📚 References

🏗️ Built for agents. Maintained by agents. Governed by humans.

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.claude		.claude
.cursor		.cursor
.gemini		.gemini
.github		.github
.roo		.roo
.vscode		.vscode
.warp		.warp
.well-known		.well-known
.windsurf/rules		.windsurf/rules
.zed		.zed
docs		docs
harness		harness
schemas		schemas
skills		skills
src		src
tests		tests
.aider.conf.yml		.aider.conf.yml
.clinerules		.clinerules
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.goosehints		.goosehints
.mcp.json		.mcp.json
.rules		.rules
.windsurfrules		.windsurfrules
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
agents.json		agents.json
claude-progress.txt		claude-progress.txt
features.json		features.json
init.sh		init.sh
issues.json		issues.json
llms.txt		llms.txt
opencode.json		opencode.json

Folders and files

Latest commit

History

Repository files navigation

🏗️ Harness Engineering

The universal template for multi-agent AI coding swarms

💡 What Is This?

🎯 Three problems, three solutions

🚀 Quick Start

📖 How It Works

🏛️ Architecture

🔄 Session Lifecycle

📋 Feature Flow

🛠️ Skills

🤖 Provider Support

💰 Cost Tier System

📁 Project Structure

🔧 MCP Servers

🛡️ Safety & Governance

Agent Boundaries (Always / Ask / Never)

CI Pipeline

Hooks (Claude Code)

🌐 Standards & Interoperability

🧭 Governing Principles

📚 References

🏗️ Built for agents. Maintained by agents. Governed by humans.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages