Skip to content

ArtemisAI/Harness_Engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ—οΈ Harness Engineering

The universal template for multi-agent AI coding swarms

AGENTS.md A2A Protocol MCP Providers License


Fork this repo. Point any AI agent at it. Run for months.

No context loss. No agent collisions. No wasted tokens.

πŸš€ Quick Start Β· πŸ“– How It Works Β· πŸ› οΈ Skills Β· πŸ€– Providers Β· πŸ“š References


πŸ’‘ What Is This?

A harness is the tool shell that lets an AI agent affect the real world. If the model is the brain, the harness is the hands and feet.

Harness Engineering is a production-ready, provider-agnostic template that coordinates multiple AI coding agents across sessions, context windows, and providers β€” without losing state or duplicating work.

🧠 "The best way to use AI agents isn't to make them smarter β€” it's to give them better tools, clearer specs, and a shared memory."

🎯 Three problems, three solutions

😰 Problem βœ… Solution
Agents lose context between sessions πŸ“ claude-progress.txt β€” append-only handoff log every agent reads
Multiple agents collide on the same code 🀝 agents.json β€” live swarm manifest with heartbeat & collision avoidance
Expensive models waste tokens on cheap tasks πŸ’° T1/T2/T3 cost tiers β€” Opus for architecture, Haiku for linting

πŸš€ Quick Start

# 1. Clone the template
git clone https://github.com/ArtemisAI/Harness_Engineering.git my-project
cd my-project

# 2. Bootstrap the environment
bash init.sh

# 3. Start your first session
# Claude Code:
/session-start

# Every other provider:
@session-start

The Initializer agent runs once, adapts features.json to your project, and scaffolds the spec pipeline. From that point, every session starts with @session-start and ends with @session-end.


πŸ“– How It Works

πŸ›οΈ Architecture

graph TB
    subgraph "🧠 Cognitive Layer"
        A["πŸ“„ AGENTS.md<br/><i>Universal spec β€” all providers</i>"]
        S["πŸ“‚ skills/<br/><i>10 reusable skill files</i>"]
        C["πŸ“œ constitution.md<br/><i>Project values & invariants</i>"]
    end

    subgraph "πŸ‘₯ Agent Swarm"
        O["🎯 Orchestrator<br/><b>T1 Heavy</b><br/>Assigns work, detects stalls"]
        CD["πŸ’» Coding Agent(s)<br/><b>T2 Standard</b><br/>One feature per session"]
        E["🧹 Entropy Agent<br/><b>T3 Fast</b><br/>Weekly cleanup scan"]
    end

    subgraph "πŸ’Ύ State Layer"
        F["features.json"]
        AG["agents.json"]
        I["issues.json"]
        P["claude-progress.txt"]
    end

    subgraph "πŸ”§ Execution Layer (MCP)"
        FS["filesystem"]
        G["git"]
        GH["github"]
        M["memory"]
        PW["playwright"]
        DW["deepwiki"]
        C7["context7"]
    end

    A --> O
    A --> CD
    A --> E
    S --> CD
    O --> F
    O --> AG
    CD --> F
    CD --> I
    CD --> P
    E --> I
    CD --> FS
    CD --> G
    CD --> GH
    CD --> DW
    CD --> C7

    style A fill:#4CAF50,color:#fff,stroke:#388E3C
    style O fill:#FF9800,color:#fff,stroke:#F57C00
    style CD fill:#2196F3,color:#fff,stroke:#1976D2
    style E fill:#9C27B0,color:#fff,stroke:#7B1FA2
Loading

πŸ”„ Session Lifecycle

sequenceDiagram
    participant H as πŸ‘€ Human
    participant A as πŸ€– Agent
    participant S as πŸ’Ύ State Files
    participant G as πŸ”§ Git

    H->>A: Start session
    A->>A: bash init.sh
    A->>S: Read features.json β†’ pick highest-priority failing feature
    A->>S: Read agents.json β†’ register self, claim feature
    A->>G: Commit registration

    rect rgb(230, 245, 255)
        Note over A,G: πŸ’» Implementation Phase
        A->>A: Read spec β†’ implement β†’ test
        A->>S: Update heartbeat every 30 min
    end

    A->>A: Run test_command βœ…
    A->>S: Update features.json β†’ "passing"
    A->>S: Write verification.md
    A->>S: Append to claude-progress.txt
    A->>S: Update agents.json β†’ "done"
    A->>G: Commit all changes
    A->>H: Session complete β€” clean handoff
Loading

πŸ“‹ Feature Flow

Every feature follows the spec-driven pipeline β€” no shortcuts:

πŸ“ Requirements  β†’  πŸ“ Design  β†’  βœ… Tasks  β†’  πŸ’» Implementation  β†’  πŸ§ͺ Verification
     ↓                  ↓             ↓               ↓                    ↓
requirements.md    design.md     tasks.md      src/ code          verification.md

Each stage produces a file under docs/features/<id>/. Templates are in docs/features/TEMPLATE/.


πŸ› οΈ Skills

All skills live in skills/ with YAML frontmatter for auto-discovery. Invoke by name:

Skill Purpose Tier Invoke
🟒 session-start Orient session, register in agents.json T2 @session-start
πŸ”΄ session-end Test β†’ verify β†’ commit β†’ handoff T2 @session-end
βš™οΈ implement-feature Spec-driven implementation T2 @implement-feature
πŸ†• new-feature Scaffold feature entry + spec files T2 @new-feature
πŸ§ͺ run-tests Narrowest-scope test run (3-attempt limit) T2/T3 @run-tests
πŸ› create-issue Log to issues.json + GitHub Issue T3 @create-issue
πŸ”€ delegate-subagent Spawn subagent in isolated worktree T1 @delegate-subagent
🧹 entropy-check Find drift, dead code, broken refs T3 @entropy-check
πŸ“ write-adr Record architecture decision T1 @write-adr
🎯 assign-agent Orchestrator only β€” route from backlog T1 @assign-agent

πŸ’‘ Claude Code uses /skill-name syntax. All other providers use @skill-name.


πŸ€– Provider Support

Works out-of-the-box with 14+ AI coding agents. All config files are pre-built and tested.

Provider Status Instructions MCP Config
Claude Code βœ… CLAUDE.md .mcp.json
GitHub Copilot βœ… .github/copilot-instructions.md .vscode/mcp.json
Gemini CLI βœ… .gemini/GEMINI.md .gemini/settings.json
Cursor βœ… .cursor/rules/harness.mdc .cursor/mcp.json
OpenAI Codex βœ… AGENTS.md β€”
Roo Code βœ… .roo/rules/harness.md .roo/mcp.json
Cline βœ… .clinerules VS Code globalStorage
Windsurf βœ… .windsurf/rules/harness.md Windsurf config
Antigravity βœ… AGENTS.md via AGENTS.md
OpenCode βœ… opencode.json opencode.json
Aider βœ… .aider.conf.yml β€”
Goose βœ… .goosehints Goose config
Zed βœ… .rules .zed/settings.json
Warp βœ… .warp/agent-instructions.md UI-managed

🌍 Universal rule: Instructions that all agents need β†’ put in AGENTS.md. Provider-specific details β†’ that provider's own config file only.


πŸ’° Cost Tier System

The orchestrator routes every task to the cheapest model capable of doing it correctly.

graph LR
    subgraph "πŸ”΄ T1 β€” Heavy"
        H1["Claude Opus 4.6"]
        H2["Gemini 2.5 Pro"]
    end

    subgraph "πŸ”΅ T2 β€” Standard"
        S1["Claude Sonnet 4.6"]
        S2["GPT-4o"]
        S3["Gemini 2.5 Flash"]
    end

    subgraph "🟒 T3 β€” Fast"
        F1["Claude Haiku 4.5"]
        F2["GPT-4o-mini"]
        F3["Kimi 2.5 / Grok fast"]
    end

    H1 -.->|"Architecture<br/>ADRs<br/>Orchestration"| USE1["πŸ›οΈ"]
    S1 -.->|"Implementation<br/>Code review<br/>Tests"| USE2["πŸ’»"]
    F1 -.->|"Linting<br/>Search<br/>Entropy"| USE3["⚑"]

    style H1 fill:#e74c3c,color:#fff
    style H2 fill:#e74c3c,color:#fff
    style S1 fill:#3498db,color:#fff
    style S2 fill:#3498db,color:#fff
    style S3 fill:#3498db,color:#fff
    style F1 fill:#2ecc71,color:#fff
    style F2 fill:#2ecc71,color:#fff
    style F3 fill:#2ecc71,color:#fff
Loading
Tier When to use Cost
πŸ”΄ T1 Heavy Architecture decisions, ADRs, orchestration, complex debugging $$$
πŸ”΅ T2 Standard Feature implementation, code review, integration tests $$
🟒 T3 Fast Doc scraping, web search, linting, entropy scans, test runs $

πŸ’‘ T3 is 10–50Γ— cheaper than T1. A documentation scrape doesn't need Opus.


πŸ“ Project Structure

πŸ“¦ Harness Engineering
β”œβ”€β”€ πŸ“„ AGENTS.md                    # 🌍 Universal spec (< 200 lines, CI enforced)
β”œβ”€β”€ πŸ“„ CLAUDE.md                    # Claude Code config (imports AGENTS.md)
β”œβ”€β”€ πŸ“„ llms.txt                     # Machine-readable file index
β”œβ”€β”€ πŸ“„ init.sh                      # πŸš€ Session bootstrap script
β”‚
β”œβ”€β”€ πŸ’Ύ State Files
β”‚   β”œβ”€β”€ features.json               # Feature backlog (failing β†’ in_progress β†’ passing)
β”‚   β”œβ”€β”€ agents.json                  # Live swarm manifest with heartbeat
β”‚   β”œβ”€β”€ issues.json                  # Bug/blocker registry β†’ synced to GitHub Issues
β”‚   └── claude-progress.txt          # Append-only session handoff log
β”‚
β”œβ”€β”€ πŸ› οΈ skills/                      # 10 universal skill definitions
β”‚   β”œβ”€β”€ session-start.skill.md
β”‚   β”œβ”€β”€ session-end.skill.md
β”‚   β”œβ”€β”€ implement-feature.skill.md
β”‚   └── ... (7 more)
β”‚
β”œβ”€β”€ πŸ€– harness/                     # Agent system prompts
β”‚   β”œβ”€β”€ initializer/prompt.md       # First-session scaffolder
β”‚   β”œβ”€β”€ session/prompt.md           # Per-session coding agent
β”‚   β”œβ”€β”€ entropy/prompt.md           # Weekly cleanup agent
β”‚   └── orchestrator/
β”‚       β”œβ”€β”€ prompt.md               # Swarm coordinator
β”‚       └── assignments/            # Per-feature delegation briefs
β”‚
β”œβ”€β”€ πŸ“š docs/
β”‚   β”œβ”€β”€ constitution.md             # Project identity & values
β”‚   β”œβ”€β”€ governance.md               # Autonomy scope, OWASP top 10
β”‚   β”œβ”€β”€ project-structure.md        # Scaffold rules, naming, secrets
β”‚   β”œβ”€β”€ decisions/                  # Architecture Decision Records
β”‚   β”‚   β”œβ”€β”€ 001-provider-agnostic-harness.md
β”‚   β”‚   β”œβ”€β”€ 002-a2a-subagent-delegation.md
β”‚   β”‚   β”œβ”€β”€ 003-spec-kit-alignment.md
β”‚   β”‚   └── 004-cost-tiered-model-selection.md
β”‚   └── features/                   # Spec pipeline per feature
β”‚       β”œβ”€β”€ TEMPLATE/               # requirements.md, design.md, tasks.md, verification.md
β”‚       β”œβ”€β”€ feat-001/
β”‚       └── ...
β”‚
β”œβ”€β”€ πŸ§ͺ tests/                       # Test scripts per feature
β”œβ”€β”€ πŸ“‹ schemas/                      # JSON Schema validation
β”œβ”€β”€ πŸ”’ .claude/                      # Hooks: bash guard, lint check, progress check
β”‚
β”œβ”€β”€ πŸ”§ Provider Configs (14+)
β”‚   β”œβ”€β”€ .mcp.json                   # Claude Code MCP
β”‚   β”œβ”€β”€ .vscode/mcp.json            # VS Code / Copilot MCP
β”‚   β”œβ”€β”€ .gemini/settings.json       # Gemini CLI
β”‚   β”œβ”€β”€ .cursor/                    # Cursor rules + MCP
β”‚   β”œβ”€β”€ .roo/                       # Roo Code rules + MCP
β”‚   β”œβ”€β”€ .github/
β”‚   β”‚   β”œβ”€β”€ copilot-instructions.md
β”‚   β”‚   β”œβ”€β”€ agents/                 # Copilot Coding Agent persona
β”‚   β”‚   β”œβ”€β”€ agentic-workflows/      # Auto-triage, PR review, entropy
β”‚   β”‚   └── workflows/              # CI: checks, entropy, code review
β”‚   └── ... (windsurf, warp, zed, opencode, aider, goose, etc.)
β”‚
└── 🌐 .well-known/
    └── agent-card.json             # A2A v0.3 agent card

πŸ”§ MCP Servers

Two layers work together β€” skills teach agents what to do, MCP servers give them tools to do it:

🧠 Agent prompt
  β†’ loads skill: session-start.skill.md     ← cognitive layer (what to do)
  β†’ calls MCP:   git_status                 ← execution layer (actually does it)
  β†’ calls MCP:   read_file(features.json)
  β†’ calls MCP:   bash(pytest ...)           ← via PTY sandbox
  β†’ calls MCP:   write_file(progress.txt)
Server Layer Purpose
πŸ“ filesystem Execution Read/write project files
πŸ”€ git Execution Commits, diffs, log, branching
πŸ™ github Execution Issues, PRs, code search, CI status
🧠 memory Execution Cross-session knowledge graph
🎭 playwright Execution Browser automation, UI verification
🌐 fetch Execution Web content retrieval
🧩 sequential-thinking Execution Externalised multi-step planning
πŸ“– deepwiki Grounding Q&A about any public GitHub repository
πŸ“š context7 Grounding Current library docs, indexed daily

⚠️ Always query deepwiki or context7 before implementing any library call. This eliminates API hallucination.


πŸ›‘οΈ Safety & Governance

Agent Boundaries (Always / Ask / Never)

Rule
βœ… Always Run tests before committing Β· Update heartbeat Β· Open issues for out-of-scope work Β· Query grounding servers
⚠️ Ask First Irreversible actions · New dependencies · 3 consecutive failures · Files outside src/, tests/, docs/
🚫 Never WIP on main · Scope expansion · rm/del/kill/sudo · Upward layer imports · Secrets in git

CI Pipeline

Every push triggers automated checks:

Job What it validates
πŸ—οΈ Structural integrity Required files exist, JSON schemas valid, AGENTS.md < 200 lines
πŸ” Secret scan No API keys, tokens, private keys, or JWTs in committed files
πŸ“ Layer order Types β†’ Config β†’ Repo β†’ Service β†’ Runtime β†’ UI β€” no upward imports
πŸ“‹ Feature specs Every feature has a spec dir; every passing feature has verification.md
πŸ“ Progress log claude-progress.txt is non-empty

Hooks (Claude Code)

Hook Trigger Action
πŸ›‘ pre_bash_guard Before any shell command Blocks rm, del, kill, sudo, &&, `
πŸ” post_lint_check After file edit/write Runs language-appropriate linter
πŸ“ stop_progress_check Session end Warns if claude-progress.txt not updated

🌐 Standards & Interoperability

Standard Status Details
AGENTS.md βœ… Compliant Agentic AI Foundation / Linux Foundation
A2A Protocol βœ… v0.3 Agent card at .well-known/agent-card.json
MCP βœ… 9 servers Anthropic Model Context Protocol
spec-kit βœ… Compatible GitHub's spec-driven development (see ADR-003)
GitHub Agentic Workflows βœ… 3 workflows Triage, PR review, entropy scan

🧭 Governing Principles

These apply to every agent on every provider. They are rules, not suggestions.

  1. πŸ‘οΈ What the agent can't see doesn't exist β€” all decisions, specs, and plans live inside the repo as files
  2. βš™οΈ Mechanical enforcement over documentation β€” CI breaks on violations; written guidelines alone erode
  3. 🎯 One feature at a time β€” every session targets exactly one features.json entry
  4. 🧹 Clean state between sessions β€” committed, logged, and ready for any agent to resume
  5. πŸ”§ Ask what capability is missing β€” when a task stalls, build the missing tool, then retry
  6. πŸ‘€ Give the agent eyes β€” Playwright for UI, logs for backend, screenshots for visual
  7. πŸ—ΊοΈ A map, not a manual β€” AGENTS.md is concise and points to deeper docs

πŸ“š References

Resource Description
πŸ“„ AGENTS.md Specification Agentic AI Foundation / Linux Foundation standard
πŸ“Š How to Write a Great AGENTS.md GitHub's analysis of 2,500+ repositories
🧠 Anthropic: Effective Harnesses The initializer + coding agent pattern
πŸ€– OpenAI: Harness Engineering Codex in an agent-first world
πŸ“ Martin Fowler: Harness Engineering Enterprise-scale agentic maintenance
πŸ”— A2A Protocol Google's Agent-to-Agent interoperability standard
πŸ“ GitHub spec-kit Spec-Driven Development methodology
⭐ Awesome Claude Code Skills, hooks, and plugins for Claude Code
⭐ Awesome Copilot Custom agents and instructions for Copilot
⭐ Awesome Skills 954+ agentic skills for AI coding assistants

πŸ—οΈ Built for agents. Maintained by agents. Governed by humans.

Fork it. Adapt features.json. Ship features.


Made with ❀️ by ArtemisAI

About

A template project for harness engineering and grounding long running agentic AI coding swarms of agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors