Agentbox ships ~200 tests across nine categories. This doc covers how to run them and how to add your own.
Testing in this repo serves two distinct contracts: the adapter contract (every impl of every slot must satisfy the same parameterised assertions — ADR-005 §Service-level objectives) and the runtime contract (every boot must satisfy PRD-002 / PRD-003 acceptance criteria, mapped 1:1 onto tests/runtime-contract/RC-*.sh scripts). Beyond those two, the suite covers validator semantics, TUI round-tripping, per-feature artefact probes, reproducibility of Nix builds, and the Nostr bridge. Read this file when you add a feature (you will almost certainly need a test in at least two categories) or when a PR fails CI and you need to know which workflow to look at.
graph TB
subgraph contracts["Two core contracts"]
AC["Adapter contract<br/>5 slots x 3 impls<br/>parameterised Jest"]
RC["Runtime contract<br/>PRD-002 / PRD-003<br/>bash TAP scripts"]
end
subgraph supporting["Supporting suites"]
VS["Validator semantics"]
TUI["TUI round-trip"]
AP["Artefact probes"]
NB["Nostr bridge"]
RB["Reproducibility"]
OB["Observability"]
end
AC -->|"enforces"| ADR["ADR-005<br/>SLOs per slot"]
RC -->|"maps to"| PRD["PRD-002 / PRD-003<br/>acceptance criteria"]
- Contract test — a parameterised Jest suite under
tests/contract/<slot>.contract.spec.jsthat runs the same assertions against every impl class for a slot. Adding an impl adds a row to the parameter list; the harness runs automatically. - Runtime-contract test — a bash script under
tests/runtime-contract/RC-<prd>-<nn>.shmapping to one PRD-002/003 acceptance criterion. TAP output, skip-77 semantics. - TAP — Test Anything Protocol;
ok N/not ok N/1..N. All bash suites emit TAP so a single runner can aggregate them. - skip-77 — the convention that a bash test exits
77when a prerequisite (Docker, Nix, GPU, SSD) is missing; the runner treats 77 as "skipped, not failed".
graph LR
subgraph jest["Jest (JavaScript)"]
CONTRACT["contract/"]
INTEG["integration/"]
SOV["sovereign/"]
CONFIG["config/"]
OBSERVE["observability/"]
end
subgraph pytest["pytest (Python)"]
TUITEST["tui/"]
end
subgraph bash["Bash (TAP)"]
RCTEST["runtime-contract/"]
BOOTTEST["bootstrap/"]
CLITEST["cli/"]
FLAKETEST["flake/"]
PROBES["artefact-probes/"]
SECTEST["security/"]
REPRO["reproducibility/"]
end
tests/
├── contract/ # Adapter contract tests (Jest) — 5 slots × 3 impls
├── integration/ # Multi-component integration (Jest)
├── sovereign/ # Nostr-bridge integration (Jest)
├── runtime-contract/ # PRD-002/003 end-to-end (bash + Jest)
├── config/ # Validator semantic-rule tests (Jest)
├── tui/ # Python TOML round-trip (pytest)
├── artefact-probes/ # Per-feature binary-exists probes (bash)
├── bootstrap/ # Entrypoint lifecycle tests (bash)
├── cli/ # agentbox.sh smoke tests (bash)
├── flake/ # Nix eval + generator tests (bash)
├── cuda/ # nvidia-smi smoke (bash)
├── 3dgs/ # COLMAP + METIS smoke (bash)
├── toolchains/ # blender + latex presence (bash)
├── security/ # gitleaks canary (bash)
├── reproducibility/ # nix-build-hash equality (bash)
├── backup/ # backup/restore round-trip (bash)
└── observability/ # metrics registry (Jest)
cd management-api
npm test # full Jest run
npx jest tests/contract # narrower
npx jest tests/contract/beads.contract.spec.js # single file
npx jest --ci --forceExit # what CI runscd tests/tui
pip install -r requirements.txt # pytest 8.3.5
pytest -v test_tui_helpers.pyEach is self-contained, executable, TAP-output:
bash tests/runtime-contract/RC-002-03.sh # pure file-lint, no Docker
bash tests/cli/smoke.sh
bash tests/reproducibility/nix-build-hash.sh # requires Nix
bash tests/backup/round-trip.sh # requires DockerBash tests exit:
0— all assertions passed1— real failure77— skipped (missing Docker / Nix / GPU / etc.)
TAP output: ok N / not ok N / ok N # SKIP reason. Final line 1..N summary.
# JavaScript
(cd management-api && npm test -- --ci)
# Python
(cd tests/tui && pytest)
# Bash (tolerates skip-77)
for f in tests/**/*.sh; do
bash "$f" || [ $? -eq 77 ] || echo "FAIL: $f"
doneflowchart TB
subgraph pr_gates["PR gates (block merge)"]
CT["contract-tests.yml"]
TT["tui-tests.yml"]
MV["manifest-validate.yml"]
FC["flake-check.yml"]
RCT["runtime-contract.yml"]
SC["shellcheck.yml"]
SS["secret-scan.yml"]
end
CI["ci.yml<br/>aggregate status"] --> CT
CI --> TT
CI --> MV
CI --> FC
CI --> RCT
CI --> SC
CI --> SS
subgraph post_merge["Post-merge / scheduled"]
BMA["build-multi-arch.yml"]
IS["image-scan.yml"]
REL["release.yml"]
DOC["docs-ci.yml"]
NFU["nix-flake-update.yml"]
end
BMA --> IS
BMA -->|"v* tag"| REL
| Workflow | Trigger | Runs |
|---|---|---|
contract-tests.yml |
PR + push to main | Jest contract suite across every adapter impl (incl. relay-consumer + opf-router paths) |
tui-tests.yml |
PR | pytest TUI round-trip fixtures |
manifest-validate.yml |
PR + push | agentbox config validate, fixture round-trip through TUI read/write, expected-error-code assertions, W-code advisory-vs-error audit |
flake-check.yml |
PR | nix flake check --no-build on amd64 + arm64 + eval of .#runtime and .#compose derivations |
runtime-contract.yml |
PR + push | Discovers and runs every tests/runtime-contract/RC-*.sh |
shellcheck.yml |
PR + push | ShellCheck at error severity (blocking) and warning severity (informational) |
secret-scan.yml |
PR + push | gitleaks + canary |
ci.yml |
PR + push | Aggregate status check — configure as the sole required status in branch protection |
| Workflow | Trigger | Runs |
|---|---|---|
build-multi-arch.yml |
push to main, v* tag, manual |
Nix build + GHCR publish on both arches; closure + compressed size captured to Actions summary; runs the PRD-001 §8 size-ceiling guard |
image-scan.yml |
after build-multi-arch.yml succeeds, manual |
Trivy HIGH/CRITICAL gate, full-severity informational run, CycloneDX + SPDX SBOM uploads, SARIF posted to the Security tab |
release.yml |
after build-multi-arch.yml on v* tag |
Extracts matching CHANGELOG section, attaches image-scan artefacts (SBOMs), publishes the GitHub Release; pre-release flag inferred from tag |
docs-ci.yml |
PR + push touching docs/ |
Link validation, frontmatter, Mermaid lint, ASCII-diagram detection, UK English, structure; 90% quality gate |
nix-flake-update.yml |
Mondays 06:00 UTC, manual | nix flake update → PR if flake.lock changed |
Failure in any PR-gate workflow blocks merge. ci.yml aggregates the gate into a single status for branch protection rules.
build-multi-arch.yml and flake-check.yml consult a Cachix binary cache when CACHIX_AUTH_TOKEN is set in repository secrets. The cache name comes from the CACHIX_CACHE_NAME repo variable (default dreamlab-ai). Missing secret → no warning; the build falls back to cold compilation. To enable:
- Create a Cachix cache at https://app.cachix.org.
- Add the write token as
CACHIX_AUTH_TOKENin repository secrets. - (Optional) set
CACHIX_CACHE_NAMErepo variable if the cache name differs fromdreamlab-ai.
When package-lock.json changes in any npm-service directory, the matching npmDepsHash in flake.nix needs refreshing. Same for the solid-pod-rs srcHash in lib/solid-pod-rs.nix when the pinned rev bumps.
# Refresh every fakeHash in one pass; idempotent, safe to re-run.
./scripts/prefetch-hashes.sh
# Just one service:
./scripts/prefetch-hashes.sh --service management-api
# Preview without writing:
./scripts/prefetch-hashes.sh --dry-runMaps 1:1 to PRD-002/003 acceptance criteria.
| Test | AC | What it proves |
|---|---|---|
| RC-002-01 | No-network boot | docker run --network none → /ready returns 200 |
| RC-002-02 | Artifact probes | Every enabled feature's binary exists + runnable |
| RC-002-03 | Install-lint | Zero npm install / pip install in entrypoint |
| RC-002-04 | Legal-write boundary | /opt/agentbox:ro mount → boot still reaches readiness |
| RC-002-05 | Missing-artefact fatal | Unlinking a required binary → supervisord exits non-zero |
| RC-003-06 | Image ref local + registry | Both AGENTBOX_IMAGE_REF cases reach /ready |
| RC-003-07 | Probes distinct | Delayed-adapter: /livez 200 + /ready 503; both 200 after |
| RC-003-08 | Metrics port chain | Manifest → compose → container → host |
| RC-003-09 | Hardening baseline | docker inspect shows non-root + read_only + cap_drop ALL |
| RC-003-10 | Exception merge | Desktop tmpfs union works; baseline drops preserved |
Current (2026-04-24):
| Category | Tests | Passing | Todo/Skip |
|---|---|---|---|
| Contract harness | 178 | 145 | 33 (infra-blocked) |
| Semantic rules | 50 | 49 | 1 (Nix-eval) |
| Runtime-contract | 10 | 10 | 0 |
| Bootstrap | 4 | 4 | 0 |
| Integration | 16 | 16 | 0 |
| TUI pytest | 23 | 23 | 0 |
| Artifact probes | 15 | 15 | 0 |
| Other bash | ~11 | all (skip-77 unless Docker) | — |
33 contract todos legitimately pending on external infrastructure:
- k6 load harness for SLO tests (×15)
- Community Solid Server + WAC for permission-denied (×3)
- ONNX runtime for embedding-error path (×3)
- SSD-backed CI runner for JSONL timing (×3)
- Dedicated HW + synthetic agent for orchestrator SLO (×9)
Each todo carries a one-line note citing the specific missing dependency.
- Add the rule to
scripts/agentbox-config-validate.jswith next-in-sequence error code. - Add a
describeblock totests/config/semantic-rules.test.jswith invalid + valid cases. - Document in ADR-005 §Validation or ADR-007 §4a.
See adapters.md §Testing. Contract suite runs automatically once the file exists at management-api/adapters/<slot>/<impl>.js.
Say you notice the memory slot contract does not currently assert that search() tolerates an empty corpus. The smallest honest addition is one assertion inside the existing parameterised block:
// tests/contract/memory.contract.spec.js
const IMPLS = ['embedded-ruvector', 'external-pg', 'off'];
describe.each(IMPLS)('memory adapter — %s', (impl) => {
let adapter;
beforeAll(async () => { adapter = await makeAdapter('memory', impl); });
afterAll(async () => { await adapter.disconnect(); });
test('search() on empty corpus returns [] not error', async () => {
if (impl === 'off') {
await expect(adapter.search('anything'))
.rejects.toThrow(/AdapterDisabled/);
return;
}
const results = await adapter.search('no-vectors-stored-yet');
expect(Array.isArray(results)).toBe(true);
expect(results).toHaveLength(0);
});
});Three properties make this a useful contract test: (1) it branches on off to express the disabled-slot semantics, (2) it asserts behaviour every live impl must share, (3) it fails loudly if any impl treats "empty" as an error. That is the shape to aim for when extending any slot's contract.
Unit tests of handlers with mocked adapters would catch some classes of regression but miss the one class this suite exists to catch: divergence between impls of the same slot. A handler passes against local-sqlite and silently breaks against external because the external impl has a subtly different error shape. The parameterised contract harness is the only layer where those impls are held to identical behaviour.
- If PRD-002/003 AC, use next
RC-NNN-NN.shslot. - Use existing
RC-*.shfiles as templates (TAP output, skip-77). - Add row to the matrix above.
- Add to
contract-tests.ymlif it should run per-PR.
- Isolate:
npx jest <file> --ci --forceExit - Port collisions (integration uses
portfinder; re-runs can leak) — verify nothing stale before rerun. - Race conditions in adapter
connect()— test harness acceptsAGENTBOX_TEST_ADAPTER_DELAY_MSfor deterministic timing. - Bash — add
set -xat the top for trace. - Contract tests use
jest --runInBandin CI to avoid parallel contention.
# Lint
npx eslint management-api/
# Tests
(cd management-api && npm test -- --ci)
(cd tests/tui && pytest)
# Validator
node scripts/agentbox-config-validate.js
# Compose regen (if you touched flake.nix)
nix build .#compose && diff result/docker-compose.yml docker-compose.ymlCI reruns all of this, but local is faster.