Fork this repo. Point any AI agent at it. Run for months.
No context loss. No agent collisions. No wasted tokens.
π Quick Start Β· π How It Works Β· π οΈ Skills Β· π€ Providers Β· π References
A harness is the tool shell that lets an AI agent affect the real world. If the model is the brain, the harness is the hands and feet.
Harness Engineering is a production-ready, provider-agnostic template that coordinates multiple AI coding agents across sessions, context windows, and providers β without losing state or duplicating work.
π§ "The best way to use AI agents isn't to make them smarter β it's to give them better tools, clearer specs, and a shared memory."
| π° Problem | β Solution |
|---|---|
| Agents lose context between sessions | π claude-progress.txt β append-only handoff log every agent reads |
| Multiple agents collide on the same code | π€ agents.json β live swarm manifest with heartbeat & collision avoidance |
| Expensive models waste tokens on cheap tasks | π° T1/T2/T3 cost tiers β Opus for architecture, Haiku for linting |
# 1. Clone the template
git clone https://github.com/ArtemisAI/Harness_Engineering.git my-project
cd my-project
# 2. Bootstrap the environment
bash init.sh
# 3. Start your first session
# Claude Code:
/session-start
# Every other provider:
@session-startThe Initializer agent runs once, adapts features.json to your project, and scaffolds the spec pipeline. From that point, every session starts with @session-start and ends with @session-end.
graph TB
subgraph "π§ Cognitive Layer"
A["π AGENTS.md<br/><i>Universal spec β all providers</i>"]
S["π skills/<br/><i>10 reusable skill files</i>"]
C["π constitution.md<br/><i>Project values & invariants</i>"]
end
subgraph "π₯ Agent Swarm"
O["π― Orchestrator<br/><b>T1 Heavy</b><br/>Assigns work, detects stalls"]
CD["π» Coding Agent(s)<br/><b>T2 Standard</b><br/>One feature per session"]
E["π§Ή Entropy Agent<br/><b>T3 Fast</b><br/>Weekly cleanup scan"]
end
subgraph "πΎ State Layer"
F["features.json"]
AG["agents.json"]
I["issues.json"]
P["claude-progress.txt"]
end
subgraph "π§ Execution Layer (MCP)"
FS["filesystem"]
G["git"]
GH["github"]
M["memory"]
PW["playwright"]
DW["deepwiki"]
C7["context7"]
end
A --> O
A --> CD
A --> E
S --> CD
O --> F
O --> AG
CD --> F
CD --> I
CD --> P
E --> I
CD --> FS
CD --> G
CD --> GH
CD --> DW
CD --> C7
style A fill:#4CAF50,color:#fff,stroke:#388E3C
style O fill:#FF9800,color:#fff,stroke:#F57C00
style CD fill:#2196F3,color:#fff,stroke:#1976D2
style E fill:#9C27B0,color:#fff,stroke:#7B1FA2
sequenceDiagram
participant H as π€ Human
participant A as π€ Agent
participant S as πΎ State Files
participant G as π§ Git
H->>A: Start session
A->>A: bash init.sh
A->>S: Read features.json β pick highest-priority failing feature
A->>S: Read agents.json β register self, claim feature
A->>G: Commit registration
rect rgb(230, 245, 255)
Note over A,G: π» Implementation Phase
A->>A: Read spec β implement β test
A->>S: Update heartbeat every 30 min
end
A->>A: Run test_command β
A->>S: Update features.json β "passing"
A->>S: Write verification.md
A->>S: Append to claude-progress.txt
A->>S: Update agents.json β "done"
A->>G: Commit all changes
A->>H: Session complete β clean handoff
Every feature follows the spec-driven pipeline β no shortcuts:
π Requirements β π Design β β
Tasks β π» Implementation β π§ͺ Verification
β β β β β
requirements.md design.md tasks.md src/ code verification.md
Each stage produces a file under docs/features/<id>/. Templates are in docs/features/TEMPLATE/.
All skills live in skills/ with YAML frontmatter for auto-discovery. Invoke by name:
| Skill | Purpose | Tier | Invoke |
|---|---|---|---|
π’ session-start |
Orient session, register in agents.json |
T2 | @session-start |
π΄ session-end |
Test β verify β commit β handoff | T2 | @session-end |
βοΈ implement-feature |
Spec-driven implementation | T2 | @implement-feature |
π new-feature |
Scaffold feature entry + spec files | T2 | @new-feature |
π§ͺ run-tests |
Narrowest-scope test run (3-attempt limit) | T2/T3 | @run-tests |
π create-issue |
Log to issues.json + GitHub Issue |
T3 | @create-issue |
π delegate-subagent |
Spawn subagent in isolated worktree | T1 | @delegate-subagent |
π§Ή entropy-check |
Find drift, dead code, broken refs | T3 | @entropy-check |
π write-adr |
Record architecture decision | T1 | @write-adr |
π― assign-agent |
Orchestrator only β route from backlog | T1 | @assign-agent |
π‘ Claude Code uses
/skill-namesyntax. All other providers use@skill-name.
Works out-of-the-box with 14+ AI coding agents. All config files are pre-built and tested.
| Provider | Status | Instructions | MCP Config |
|---|---|---|---|
| Claude Code | β | CLAUDE.md |
.mcp.json |
| GitHub Copilot | β | .github/copilot-instructions.md |
.vscode/mcp.json |
| Gemini CLI | β | .gemini/GEMINI.md |
.gemini/settings.json |
| Cursor | β | .cursor/rules/harness.mdc |
.cursor/mcp.json |
| OpenAI Codex | β | AGENTS.md |
β |
| Roo Code | β | .roo/rules/harness.md |
.roo/mcp.json |
| Cline | β | .clinerules |
VS Code globalStorage |
| Windsurf | β | .windsurf/rules/harness.md |
Windsurf config |
| Antigravity | β | AGENTS.md |
via AGENTS.md |
| OpenCode | β | opencode.json |
opencode.json |
| Aider | β | .aider.conf.yml |
β |
| Goose | β | .goosehints |
Goose config |
| Zed | β | .rules |
.zed/settings.json |
| Warp | β | .warp/agent-instructions.md |
UI-managed |
π Universal rule: Instructions that all agents need β put in
AGENTS.md. Provider-specific details β that provider's own config file only.
The orchestrator routes every task to the cheapest model capable of doing it correctly.
graph LR
subgraph "π΄ T1 β Heavy"
H1["Claude Opus 4.6"]
H2["Gemini 2.5 Pro"]
end
subgraph "π΅ T2 β Standard"
S1["Claude Sonnet 4.6"]
S2["GPT-4o"]
S3["Gemini 2.5 Flash"]
end
subgraph "π’ T3 β Fast"
F1["Claude Haiku 4.5"]
F2["GPT-4o-mini"]
F3["Kimi 2.5 / Grok fast"]
end
H1 -.->|"Architecture<br/>ADRs<br/>Orchestration"| USE1["ποΈ"]
S1 -.->|"Implementation<br/>Code review<br/>Tests"| USE2["π»"]
F1 -.->|"Linting<br/>Search<br/>Entropy"| USE3["β‘"]
style H1 fill:#e74c3c,color:#fff
style H2 fill:#e74c3c,color:#fff
style S1 fill:#3498db,color:#fff
style S2 fill:#3498db,color:#fff
style S3 fill:#3498db,color:#fff
style F1 fill:#2ecc71,color:#fff
style F2 fill:#2ecc71,color:#fff
style F3 fill:#2ecc71,color:#fff
| Tier | When to use | Cost |
|---|---|---|
| π΄ T1 Heavy | Architecture decisions, ADRs, orchestration, complex debugging | $$$ |
| π΅ T2 Standard | Feature implementation, code review, integration tests | $$ |
| π’ T3 Fast | Doc scraping, web search, linting, entropy scans, test runs | $ |
π‘ T3 is 10β50Γ cheaper than T1. A documentation scrape doesn't need Opus.
π¦ Harness Engineering
βββ π AGENTS.md # π Universal spec (< 200 lines, CI enforced)
βββ π CLAUDE.md # Claude Code config (imports AGENTS.md)
βββ π llms.txt # Machine-readable file index
βββ π init.sh # π Session bootstrap script
β
βββ πΎ State Files
β βββ features.json # Feature backlog (failing β in_progress β passing)
β βββ agents.json # Live swarm manifest with heartbeat
β βββ issues.json # Bug/blocker registry β synced to GitHub Issues
β βββ claude-progress.txt # Append-only session handoff log
β
βββ π οΈ skills/ # 10 universal skill definitions
β βββ session-start.skill.md
β βββ session-end.skill.md
β βββ implement-feature.skill.md
β βββ ... (7 more)
β
βββ π€ harness/ # Agent system prompts
β βββ initializer/prompt.md # First-session scaffolder
β βββ session/prompt.md # Per-session coding agent
β βββ entropy/prompt.md # Weekly cleanup agent
β βββ orchestrator/
β βββ prompt.md # Swarm coordinator
β βββ assignments/ # Per-feature delegation briefs
β
βββ π docs/
β βββ constitution.md # Project identity & values
β βββ governance.md # Autonomy scope, OWASP top 10
β βββ project-structure.md # Scaffold rules, naming, secrets
β βββ decisions/ # Architecture Decision Records
β β βββ 001-provider-agnostic-harness.md
β β βββ 002-a2a-subagent-delegation.md
β β βββ 003-spec-kit-alignment.md
β β βββ 004-cost-tiered-model-selection.md
β βββ features/ # Spec pipeline per feature
β βββ TEMPLATE/ # requirements.md, design.md, tasks.md, verification.md
β βββ feat-001/
β βββ ...
β
βββ π§ͺ tests/ # Test scripts per feature
βββ π schemas/ # JSON Schema validation
βββ π .claude/ # Hooks: bash guard, lint check, progress check
β
βββ π§ Provider Configs (14+)
β βββ .mcp.json # Claude Code MCP
β βββ .vscode/mcp.json # VS Code / Copilot MCP
β βββ .gemini/settings.json # Gemini CLI
β βββ .cursor/ # Cursor rules + MCP
β βββ .roo/ # Roo Code rules + MCP
β βββ .github/
β β βββ copilot-instructions.md
β β βββ agents/ # Copilot Coding Agent persona
β β βββ agentic-workflows/ # Auto-triage, PR review, entropy
β β βββ workflows/ # CI: checks, entropy, code review
β βββ ... (windsurf, warp, zed, opencode, aider, goose, etc.)
β
βββ π .well-known/
βββ agent-card.json # A2A v0.3 agent card
Two layers work together β skills teach agents what to do, MCP servers give them tools to do it:
π§ Agent prompt
β loads skill: session-start.skill.md β cognitive layer (what to do)
β calls MCP: git_status β execution layer (actually does it)
β calls MCP: read_file(features.json)
β calls MCP: bash(pytest ...) β via PTY sandbox
β calls MCP: write_file(progress.txt)
| Server | Layer | Purpose |
|---|---|---|
π filesystem |
Execution | Read/write project files |
π git |
Execution | Commits, diffs, log, branching |
π github |
Execution | Issues, PRs, code search, CI status |
π§ memory |
Execution | Cross-session knowledge graph |
π playwright |
Execution | Browser automation, UI verification |
π fetch |
Execution | Web content retrieval |
π§© sequential-thinking |
Execution | Externalised multi-step planning |
π deepwiki |
Grounding | Q&A about any public GitHub repository |
π context7 |
Grounding | Current library docs, indexed daily |
β οΈ Always querydeepwikiorcontext7before implementing any library call. This eliminates API hallucination.
| Rule | |
|---|---|
| β Always | Run tests before committing Β· Update heartbeat Β· Open issues for out-of-scope work Β· Query grounding servers |
Irreversible actions Β· New dependencies Β· 3 consecutive failures Β· Files outside src/, tests/, docs/ |
|
| π« Never | WIP on main Β· Scope expansion Β· rm/del/kill/sudo Β· Upward layer imports Β· Secrets in git |
Every push triggers automated checks:
| Job | What it validates |
|---|---|
| ποΈ Structural integrity | Required files exist, JSON schemas valid, AGENTS.md < 200 lines |
| π Secret scan | No API keys, tokens, private keys, or JWTs in committed files |
| π Layer order | Types β Config β Repo β Service β Runtime β UI β no upward imports |
| π Feature specs | Every feature has a spec dir; every passing feature has verification.md |
| π Progress log | claude-progress.txt is non-empty |
| Hook | Trigger | Action |
|---|---|---|
π pre_bash_guard |
Before any shell command | Blocks rm, del, kill, sudo, &&, ` |
π post_lint_check |
After file edit/write | Runs language-appropriate linter |
π stop_progress_check |
Session end | Warns if claude-progress.txt not updated |
| Standard | Status | Details |
|---|---|---|
| AGENTS.md | β Compliant | Agentic AI Foundation / Linux Foundation |
| A2A Protocol | β v0.3 | Agent card at .well-known/agent-card.json |
| MCP | β 9 servers | Anthropic Model Context Protocol |
| spec-kit | β Compatible | GitHub's spec-driven development (see ADR-003) |
| GitHub Agentic Workflows | β 3 workflows | Triage, PR review, entropy scan |
These apply to every agent on every provider. They are rules, not suggestions.
- ποΈ What the agent can't see doesn't exist β all decisions, specs, and plans live inside the repo as files
- βοΈ Mechanical enforcement over documentation β CI breaks on violations; written guidelines alone erode
- π― One feature at a time β every session targets exactly one
features.jsonentry - π§Ή Clean state between sessions β committed, logged, and ready for any agent to resume
- π§ Ask what capability is missing β when a task stalls, build the missing tool, then retry
- π Give the agent eyes β Playwright for UI, logs for backend, screenshots for visual
- πΊοΈ A map, not a manual β
AGENTS.mdis concise and points to deeper docs
| Resource | Description |
|---|---|
| π AGENTS.md Specification | Agentic AI Foundation / Linux Foundation standard |
| π How to Write a Great AGENTS.md | GitHub's analysis of 2,500+ repositories |
| π§ Anthropic: Effective Harnesses | The initializer + coding agent pattern |
| π€ OpenAI: Harness Engineering | Codex in an agent-first world |
| π Martin Fowler: Harness Engineering | Enterprise-scale agentic maintenance |
| π A2A Protocol | Google's Agent-to-Agent interoperability standard |
| π GitHub spec-kit | Spec-Driven Development methodology |
| β Awesome Claude Code | Skills, hooks, and plugins for Claude Code |
| β Awesome Copilot | Custom agents and instructions for Copilot |
| β Awesome Skills | 954+ agentic skills for AI coding assistants |