feat(adf): subscription-based provider routing with persona and skill chains by AlexMikhalev · Pull Request #705 · terraphim/terraphim-ai

AlexMikhalev · 2026-03-20T15:04:05Z

Summary

Extend orchestrator config with provider/model/fallback routing, ProviderTier enum, persona fields, and skill chain registry (Fixes Fix dependencies between workspace crates #28, Embed default config and attempt to fix tests #29, Fix svelte fetch results from search API - commented out in current build #32, Multiple front end fixes #34)
Subscription guard rejects banned opencode/ Zen prefix at runtime (Fixes Fixed APIs for create article from server and search #31)
Fallback dispatch with circuit breaker (3 failures -> open for 5min -> half-open probe) (Fixes Takes default settings from CARGO_MANIFEST_DIR #30)
Persona identity injection into agent prompts (8 Terraphim personas) (Fixes Docs draft - TBC #33)
Skill chain integration from terraphim-skills (18 skills) and zestic-engineering-skills (12 skills) with SkillResolver (Fixes Multiple front end fixes and backend fix to return empty article list #35, Improve settings handling #36)
OpenCodeEvent NDJSON parser for opencode CLI output (Fixes Add basic README.md #37)
11 integration tests covering provider routing, circuit breaker, subscription guard, NDJSON parsing, skill chains, personas (Fixes cargo fmt #38)
Production orchestrator.toml with 13 agents across Safety/Core/Growth layers (Fixes Create article index works differently when called from Axum server and from ripgrep middleware #39)
4 ADRs: subscription-only providers, four-tier routing, persona identity layer, kimi-for-coding implementation tier

Stats

21 files changed, +4,645 lines
103 tests passing (92 unit + 11 integration)

ADRs

ADR-002: Subscription-only model providers (ban opencode/Zen)
ADR-003: Four-tier model routing (Quick/Deep/Implementation/Oracle)
ADR-004: Terraphim persona identity layer (8 named personas)
ADR-005: kimi-for-coding as implementation tier

Test plan

cargo test -p terraphim_spawner -- 92 unit tests pass
Integration tests -- 11 tests pass (provider routing, circuit breaker, subscription guard, NDJSON, skills, personas)
cargo clippy -p terraphim_spawner -- clean
Rebuild adf binary on bigbox and restart service
Verify agents start with new provider routing

Security hardening for terraphim_rlm crate: 1. Created validation.rs module with: - validate_snapshot_name(): Prevents path traversal attacks - validate_code_input(): Enforces MAX_CODE_SIZE (1MB) limit - validate_session_id(): Validates UUID format - validate_recursion_depth(): Prevents stack overflow - Security constants: MAX_CODE_SIZE, MAX_INPUT_SIZE, MAX_RECURSION_DEPTH 2. Fixed race condition in firecracker.rs: - Changed snapshot counter from read-then-write to atomic write lock - Added validate_snapshot_name() call before snapshot creation - Prevents TOCTOU vulnerability where concurrent snapshots could exceed limit 3. Enhanced mcp_tools.rs: - Added MAX_CODE_SIZE validation for rlm_code tool - Added MAX_CODE_SIZE validation for rlm_bash tool - Returns proper MCP error format for validation failures Refs #426

Refs: PR #426 Features: - fcctl-core to terraphim_firecracker adapter - Sub-500ms VM allocation (267ms measured) - ULID-based VM ID enforcement - Full trait implementation with error preservation - 119 tests passing Validation: All acceptance criteria met

…erTier enum Add provider, fallback_provider, fallback_model, and provider_tier fields to AgentDefinition for subscription-based model routing (ADR-002, ADR-003). Add ProviderTier enum (Quick/Deep/Implementation/Oracle) with per-tier timeout values. Add opencode CLI support in spawner arg inference. All new fields are Optional with serde(default) for backward compatibility. Fixes #28 Refs #29 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add provider allowlist and banned_providers fields to OrchestratorConfig. Default banned list includes "opencode" (Zen pay-per-use proxy, ADR-002). Add validate_provider() method to spawner AgentConfig that rejects any model string starting with a banned prefix, while correctly allowing opencode-go/ (subscription) and kimi-for-coding/ (subscription). Fixes #31 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add persona_name, persona_symbol, persona_vibe, meta_cortex_connections, and skill_chain fields to AgentDefinition for the four-layer identity stack (Persona/Role/SFIA/Skills). All fields Optional/default for backward compatibility. See ADR-004. Fixes #32 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add SpawnRequest struct and spawn_with_fallback() method that: - Validates banned providers before spawn (ADR-002) - Uses ProviderTier timeout values (Quick=30s, Deep=60s, Impl=120s, Oracle=300s) - Retries with fallback provider/model on primary failure - Integrates per-provider circuit breakers (3 failures = open 5 min) - Returns SpawnerError on both primary and fallback failure SpawnRequest avoids circular dependency with terraphim_orchestrator by mirroring needed AgentDefinition fields in the spawner crate. Fixes #30 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add persona_name, persona_symbol, persona_vibe, and meta_cortex_connections to SpawnRequest. The build_persona_prefix() function generates a markdown identity block that is prepended to the agent task prompt when persona fields are configured. Agents without persona config are unaffected. Fixes #33 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add SkillChainRegistry with 31 terraphim-skills and 16 zestic-skills. Provides validate_chain() to verify skill chains and validate_skill_chains() on OrchestratorConfig to validate all agents. Backward compatible -- agents with empty skill_chain pass validation. Fixes #34 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add OpenCodeEvent struct with parsing for opencode run --format json output events (step_start, text, tool_use, step_finish, result). Includes text_content(), total_tokens(), parse_line(), and parse_lines() helper methods for structured output consumption. Fixes #37 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add SkillResolver in skill_resolver.rs that maps skill chain names to actual skill file paths from the terraphim-skills repository. Features: - SkillResolver with registry of terraphim skills (security-audit, code-review, session-search, local-knowledge, git-safety-guard, devops, disciplined-research, architecture, disciplined-design, requirements-traceability, testing, acceptance-testing, documentation, md-book, implementation, rust-development, visual-testing, quality-gate) - resolve_skill_chain() method that takes Vec<skill_name> and returns resolved skill metadata (name, description, applicable_to, paths, source) - Validation of skill chains with proper error handling - Comprehensive test coverage including valid resolution, missing skill errors, and empty chain handling - Exported from crate as SkillResolver, SkillSource, ResolvedSkill, SkillResolutionError Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…h Refs #36 Extend SkillResolver to support zestic-engineering-skills from zestic-ai/6d-prompts repository alongside existing terraphim-skills. Features: - Add SkillSource enum (Terraphim, Zestic) to distinguish skill origins - Initialize zestic skills registry with 12 skills: quality-oversight, responsible-ai, insight-synthesis, perspective-investigation, product-vision, wardley-mapping, business-scenario-design, rust-mastery, cross-platform, frontend, via-negativa-analysis, strategy-execution - Each resolved skill includes its source metadata - SkillChainRegistry validation accepts skills from both sources - Comprehensive tests for mixed skill chains (terraphim + zestic together) Tests added: - test_resolver_has_zestic_skills - test_resolve_zestic_skill - test_resolve_mixed_skill_chain - test_validate_mixed_skill_chain_valid - test_validate_only_zestic_skills - test_mixed_chain_with_invalid_skills - test_zestic_skill_source_in_resolved - test_all_skill_names_includes_zestic - test_zestic_skill_structure - test_resolver_custom_paths_for_zestic Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…efs #38

…routing Refs #39 Replace all legacy codex CLI references with opencode + subscription providers. Add persona, skill_chain, provider/model/fallback fields to all agents. New agents: compliance-watchdog, drift-detector, spec-validator, test-guardian, documentation-generator, implementation-swarm, compound-review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

SSH non-interactive shells (systemd) do not include ~/.bun/bin or ~/.local/bin in PATH. Use full paths for opencode and claude binaries. Also fix TOML escape in --since quote. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Creates crates/terraphim_judge_evaluator with: - SimpleAgent struct wrapping terraphim_router for KG lookups - KgMatch struct for representing term matches - lookup_terms() method using Aho-Corasick automata - enrich_prompt() for appending KG context to judge prompts - Comprehensive tests for all functionality Also adds automation/judge/model-mapping.json configuration file.

Implements JudgeModelRouter for tier-based LLM model selection: - TierConfig struct for provider+model pairs - ModelMappingConfig for complete configuration - JudgeModelRouter with methods: * from_config(path) - load from JSON file * resolve_tier(tier) - get provider/model for a tier * resolve_profile(profile) - get tier sequence for a profile - Supports tiers: quick, deep, tiebreaker, oracle - Supports profiles: default, thorough, critical, exhaustive - Comprehensive tests for all functionality - Updates automation/judge/model-mapping.json with profile definitions

Implements JudgeAgent as a supervised agent combining: - SimpleAgent for Knowledge Graph context enrichment - JudgeModelRouter for tier-based model selection - SupervisedAgent trait from terraphim_agent_supervisor JudgeVerdict struct with: - verdict: final evaluation result (PASS/FAIL/NEEDS_REVIEW) - scores: BTreeMap of detailed category scores - judge_tier: which tier produced the verdict - judge_cli: command used for evaluation - latency_ms: evaluation duration Evaluation pipeline: 1. Load file content 2. Enrich with KG context (optional) 3. Select model tier based on profile 4. Parse verdict from CLI output (JSON or text format) Includes comprehensive tests for: - Verdict parsing and manipulation - Supervised agent lifecycle - Profile-based tier selection - System message handling

- Add stagger_delay_ms config field (default: 5000ms) - Insert stagger delay between Safety agent spawns in run() - Add random jitter (0 to stagger_delay_ms) for Core agent cron spawns - Add tests for stagger delay configuration Refs #16

- Add ReviewRequest struct with from_agent, to_agent, artifact_path, review_type - Add ReviewPair config for defining (producer, reviewer) pairs - Add review_queue: Vec<ReviewRequest> to orchestrator state - Add submit_review_request(), review_queue(), process_review_queue() methods - Add check_review_trigger() to automatically queue reviews on agent completion - Add tests for review queue operations and config loading Refs #17

- Add drift_detection module with DriftDetector and DriftReport - Load strategic goals from plans/ directory markdown files - Check drift every N ticks (configurable via drift_detection.check_interval_ticks) - Calculate drift score by comparing agent outputs against strategic goals - DriftReport includes agent, drift_score, and explanation - Log warnings when drift_score exceeds threshold (default: 0.6) - Add DriftDetectionConfig to OrchestratorConfig - Add tests for drift detection functionality Refs #18

- Add session_rotation module with SessionRotationManager and AgentSession - Add SessionRotationConfig with max_sessions_before_rotation (default: 10) - Track completions_since_rotation and completed_sessions per agent - on_agent_completion() records completion and triggers rotation at threshold - Rotation creates new session ID and clears accumulated context - Add comprehensive tests for session rotation Refs #19

- Add convergence_detector module with ConvergenceDetector and ConvergenceSignal - Add ConvergenceConfig with threshold (default: 0.95) and consecutive_threshold (default: 3) - Calculate output similarity using Jaccard index on word sets - Detect convergence after N consecutive similar outputs - ConvergenceSignal includes agent, similarity, and consecutive_count - Reset on divergence to handle changing outputs - Add comprehensive tests for convergence detection Refs #20

…efs #23

…iven mode Issue #8: Extend orchestrator config for issue-driven mode - Add WorkflowConfig with mode (time_only/issue_only/dual), poll_interval_secs, max_concurrent_tasks - Add TrackerConfig with tracker_type (gitea/linear), url, token_env_var, owner, repo - Add ConcurrencyConfig with max_parallel_agents, queue_depth, starvation_timeout_secs - All optional in orchestrator.toml with backward compatible defaults (time_only mode is default) - Tests: parse config with and without workflow section, defaults applied Issue #9: Unified dispatcher - Create crates/terraphim_orchestrator/src/dispatcher.rs - DispatchTask enum: TimeTask(agent_name, schedule), IssueTask(agent_name, issue_id, priority) - DispatchQueue: priority queue backed by BinaryHeap with round-robin fairness - ConcurrencyController: semaphore-based with starvation timeout - Methods: submit(task), next() -> Option<DispatchTask>, active_count(), is_full() - Fairness: alternates between time and issue tasks at equal priority - Tests: submit/dequeue, priority ordering, concurrency limits, fairness Issue #10: Issue mode controller - Create crates/terraphim_orchestrator/src/issue_mode.rs - IssueMode struct using terraphim_tracker::GiteaTracker to poll for issues - Poll loop every poll_interval_secs fetching ready issues via PageRank sorting - Filter out blocked issues and already-running tasks - Map issues to agents based on labels ([ADF] -> implementation-swarm) or title patterns - Submit IssueTask to DispatchQueue - Tests: poll cycle, issue-to-agent mapping, priority calculation, blocked issue filtering Issue #11: Time mode refactor - Refactor crates/terraphim_orchestrator/src/scheduler.rs - Extract TimeMode struct wrapping existing cron scheduler - TimeMode submits TimeTask to DispatchQueue instead of spawning directly (when configured) - Maintain backward compatibility: legacy mode spawns directly if no WorkflowConfig - Tests: TimeMode submits to queue, legacy mode still works Refs #8 #9 #10 #11

- Add ModeCoordinator that manages TimeMode and IssueMode - Implement unified shutdown: signal modes, drain queue, wait for active tasks - Extend spawner integration to dispatch tasks from queue to agents - Add stall detection: log warning when queue exceeds threshold - Add comprehensive tests for dual mode, shutdown coordination, and stall detection Refs #12

Create comprehensive E2E test suite in tests/e2e_tests.rs: - test_dual_mode_operation: verify both time and issue tasks processed - test_time_mode_only: legacy config compatibility - test_issue_mode_only: issue-only config verification - test_fairness_under_load: no starvation between task types - test_graceful_shutdown: clean termination with queue draining - test_stall_detection: warning logged when queue exceeds threshold - Additional tests for concurrency limits, prioritization, and backward compatibility Uses mock tracker and avoids real API calls. Refs #13

- Create src/compat.rs: Symphony compatibility layer with type aliases, adapters - Add migration helpers for enabling dual mode from legacy configs - Create MIGRATION.md with comprehensive migration guide - Update CLAUDE.md with dual mode architecture description - Include backward compatibility notes and configuration examples - Export compat module from lib.rs Refs #14

- Run cargo test --workspace (orchestrator tests pass) - Fix all warnings in orchestrator crate - Create CHANGELOG.md entry for v1.9.0 release - Verify backward compatibility with test_backward_compatibility - Build release binary: cargo build --release -p terraphim_orchestrator - All 142 tests pass in orchestrator crate Release includes: - Dual mode orchestrator with TimeMode and IssueMode - Unified dispatch queue with fairness - Stall detection and graceful shutdown - Symphony compatibility layer - Comprehensive E2E test suite - Migration documentation Refs #15

AlexMikhalev and others added 30 commits March 17, 2026 16:24

feat(spawner): add integration tests for opencode provider dispatch R…

812555c

…efs #38

fix(ops): use absolute CLI paths in orchestrator.toml

708500c

SSH non-interactive shells (systemd) do not include ~/.bun/bin or ~/.local/bin in PATH. Use full paths for opencode and claude binaries. Also fix TOML escape in --since quote. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(judge): add parallel batch evaluation via ExecutionCoordinator R…

67d3b59

…efs #23

feat(judge): build judge-evaluator CLI binary Refs #27

bed0812

feat(spawner): add integration tests for ClaudeCodeSession Refs #5

0199a91

feat(workspace): extract terraphim_workspace crate Refs #6

2c0a53a

feat(tracker): extract terraphim_tracker crate Refs #7

d57bc27

AlexMikhalev added 2 commits March 20, 2026 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(adf): subscription-based provider routing with persona and skill chains#705

feat(adf): subscription-based provider routing with persona and skill chains#705
AlexMikhalev wants to merge 32 commits intomainfrom
task/28-provider-model-fallback-schema

AlexMikhalev commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AlexMikhalev commented Mar 20, 2026

Summary

Stats

ADRs

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant