feat(adf): subscription-based provider routing with persona and skill chains#705
Open
AlexMikhalev wants to merge 32 commits intomainfrom
Open
feat(adf): subscription-based provider routing with persona and skill chains#705AlexMikhalev wants to merge 32 commits intomainfrom
AlexMikhalev wants to merge 32 commits intomainfrom
Conversation
Security hardening for terraphim_rlm crate: 1. Created validation.rs module with: - validate_snapshot_name(): Prevents path traversal attacks - validate_code_input(): Enforces MAX_CODE_SIZE (1MB) limit - validate_session_id(): Validates UUID format - validate_recursion_depth(): Prevents stack overflow - Security constants: MAX_CODE_SIZE, MAX_INPUT_SIZE, MAX_RECURSION_DEPTH 2. Fixed race condition in firecracker.rs: - Changed snapshot counter from read-then-write to atomic write lock - Added validate_snapshot_name() call before snapshot creation - Prevents TOCTOU vulnerability where concurrent snapshots could exceed limit 3. Enhanced mcp_tools.rs: - Added MAX_CODE_SIZE validation for rlm_code tool - Added MAX_CODE_SIZE validation for rlm_bash tool - Returns proper MCP error format for validation failures Refs #426
Refs: PR #426 Features: - fcctl-core to terraphim_firecracker adapter - Sub-500ms VM allocation (267ms measured) - ULID-based VM ID enforcement - Full trait implementation with error preservation - 119 tests passing Validation: All acceptance criteria met
…erTier enum Add provider, fallback_provider, fallback_model, and provider_tier fields to AgentDefinition for subscription-based model routing (ADR-002, ADR-003). Add ProviderTier enum (Quick/Deep/Implementation/Oracle) with per-tier timeout values. Add opencode CLI support in spawner arg inference. All new fields are Optional with serde(default) for backward compatibility. Fixes #28 Refs #29 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add provider allowlist and banned_providers fields to OrchestratorConfig. Default banned list includes "opencode" (Zen pay-per-use proxy, ADR-002). Add validate_provider() method to spawner AgentConfig that rejects any model string starting with a banned prefix, while correctly allowing opencode-go/ (subscription) and kimi-for-coding/ (subscription). Fixes #31 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add persona_name, persona_symbol, persona_vibe, meta_cortex_connections, and skill_chain fields to AgentDefinition for the four-layer identity stack (Persona/Role/SFIA/Skills). All fields Optional/default for backward compatibility. See ADR-004. Fixes #32 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add SpawnRequest struct and spawn_with_fallback() method that: - Validates banned providers before spawn (ADR-002) - Uses ProviderTier timeout values (Quick=30s, Deep=60s, Impl=120s, Oracle=300s) - Retries with fallback provider/model on primary failure - Integrates per-provider circuit breakers (3 failures = open 5 min) - Returns SpawnerError on both primary and fallback failure SpawnRequest avoids circular dependency with terraphim_orchestrator by mirroring needed AgentDefinition fields in the spawner crate. Fixes #30 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add persona_name, persona_symbol, persona_vibe, and meta_cortex_connections to SpawnRequest. The build_persona_prefix() function generates a markdown identity block that is prepended to the agent task prompt when persona fields are configured. Agents without persona config are unaffected. Fixes #33 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add SkillChainRegistry with 31 terraphim-skills and 16 zestic-skills. Provides validate_chain() to verify skill chains and validate_skill_chains() on OrchestratorConfig to validate all agents. Backward compatible -- agents with empty skill_chain pass validation. Fixes #34 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add OpenCodeEvent struct with parsing for opencode run --format json output events (step_start, text, tool_use, step_finish, result). Includes text_content(), total_tokens(), parse_line(), and parse_lines() helper methods for structured output consumption. Fixes #37 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add SkillResolver in skill_resolver.rs that maps skill chain names to actual skill file paths from the terraphim-skills repository. Features: - SkillResolver with registry of terraphim skills (security-audit, code-review, session-search, local-knowledge, git-safety-guard, devops, disciplined-research, architecture, disciplined-design, requirements-traceability, testing, acceptance-testing, documentation, md-book, implementation, rust-development, visual-testing, quality-gate) - resolve_skill_chain() method that takes Vec<skill_name> and returns resolved skill metadata (name, description, applicable_to, paths, source) - Validation of skill chains with proper error handling - Comprehensive test coverage including valid resolution, missing skill errors, and empty chain handling - Exported from crate as SkillResolver, SkillSource, ResolvedSkill, SkillResolutionError Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…h Refs #36 Extend SkillResolver to support zestic-engineering-skills from zestic-ai/6d-prompts repository alongside existing terraphim-skills. Features: - Add SkillSource enum (Terraphim, Zestic) to distinguish skill origins - Initialize zestic skills registry with 12 skills: quality-oversight, responsible-ai, insight-synthesis, perspective-investigation, product-vision, wardley-mapping, business-scenario-design, rust-mastery, cross-platform, frontend, via-negativa-analysis, strategy-execution - Each resolved skill includes its source metadata - SkillChainRegistry validation accepts skills from both sources - Comprehensive tests for mixed skill chains (terraphim + zestic together) Tests added: - test_resolver_has_zestic_skills - test_resolve_zestic_skill - test_resolve_mixed_skill_chain - test_validate_mixed_skill_chain_valid - test_validate_only_zestic_skills - test_mixed_chain_with_invalid_skills - test_zestic_skill_source_in_resolved - test_all_skill_names_includes_zestic - test_zestic_skill_structure - test_resolver_custom_paths_for_zestic Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…routing Refs #39 Replace all legacy codex CLI references with opencode + subscription providers. Add persona, skill_chain, provider/model/fallback fields to all agents. New agents: compliance-watchdog, drift-detector, spec-validator, test-guardian, documentation-generator, implementation-swarm, compound-review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SSH non-interactive shells (systemd) do not include ~/.bun/bin or ~/.local/bin in PATH. Use full paths for opencode and claude binaries. Also fix TOML escape in --since quote. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Creates crates/terraphim_judge_evaluator with: - SimpleAgent struct wrapping terraphim_router for KG lookups - KgMatch struct for representing term matches - lookup_terms() method using Aho-Corasick automata - enrich_prompt() for appending KG context to judge prompts - Comprehensive tests for all functionality Also adds automation/judge/model-mapping.json configuration file.
Implements JudgeModelRouter for tier-based LLM model selection: - TierConfig struct for provider+model pairs - ModelMappingConfig for complete configuration - JudgeModelRouter with methods: * from_config(path) - load from JSON file * resolve_tier(tier) - get provider/model for a tier * resolve_profile(profile) - get tier sequence for a profile - Supports tiers: quick, deep, tiebreaker, oracle - Supports profiles: default, thorough, critical, exhaustive - Comprehensive tests for all functionality - Updates automation/judge/model-mapping.json with profile definitions
Implements JudgeAgent as a supervised agent combining: - SimpleAgent for Knowledge Graph context enrichment - JudgeModelRouter for tier-based model selection - SupervisedAgent trait from terraphim_agent_supervisor JudgeVerdict struct with: - verdict: final evaluation result (PASS/FAIL/NEEDS_REVIEW) - scores: BTreeMap of detailed category scores - judge_tier: which tier produced the verdict - judge_cli: command used for evaluation - latency_ms: evaluation duration Evaluation pipeline: 1. Load file content 2. Enrich with KG context (optional) 3. Select model tier based on profile 4. Parse verdict from CLI output (JSON or text format) Includes comprehensive tests for: - Verdict parsing and manipulation - Supervised agent lifecycle - Profile-based tier selection - System message handling
- Add stagger_delay_ms config field (default: 5000ms) - Insert stagger delay between Safety agent spawns in run() - Add random jitter (0 to stagger_delay_ms) for Core agent cron spawns - Add tests for stagger delay configuration Refs #16
- Add ReviewRequest struct with from_agent, to_agent, artifact_path, review_type - Add ReviewPair config for defining (producer, reviewer) pairs - Add review_queue: Vec<ReviewRequest> to orchestrator state - Add submit_review_request(), review_queue(), process_review_queue() methods - Add check_review_trigger() to automatically queue reviews on agent completion - Add tests for review queue operations and config loading Refs #17
- Add drift_detection module with DriftDetector and DriftReport - Load strategic goals from plans/ directory markdown files - Check drift every N ticks (configurable via drift_detection.check_interval_ticks) - Calculate drift score by comparing agent outputs against strategic goals - DriftReport includes agent, drift_score, and explanation - Log warnings when drift_score exceeds threshold (default: 0.6) - Add DriftDetectionConfig to OrchestratorConfig - Add tests for drift detection functionality Refs #18
- Add session_rotation module with SessionRotationManager and AgentSession - Add SessionRotationConfig with max_sessions_before_rotation (default: 10) - Track completions_since_rotation and completed_sessions per agent - on_agent_completion() records completion and triggers rotation at threshold - Rotation creates new session ID and clears accumulated context - Add comprehensive tests for session rotation Refs #19
- Add convergence_detector module with ConvergenceDetector and ConvergenceSignal - Add ConvergenceConfig with threshold (default: 0.95) and consecutive_threshold (default: 3) - Calculate output similarity using Jaccard index on word sets - Detect convergence after N consecutive similar outputs - ConvergenceSignal includes agent, similarity, and consecutive_count - Reset on divergence to handle changing outputs - Add comprehensive tests for convergence detection Refs #20
…iven mode Issue #8: Extend orchestrator config for issue-driven mode - Add WorkflowConfig with mode (time_only/issue_only/dual), poll_interval_secs, max_concurrent_tasks - Add TrackerConfig with tracker_type (gitea/linear), url, token_env_var, owner, repo - Add ConcurrencyConfig with max_parallel_agents, queue_depth, starvation_timeout_secs - All optional in orchestrator.toml with backward compatible defaults (time_only mode is default) - Tests: parse config with and without workflow section, defaults applied Issue #9: Unified dispatcher - Create crates/terraphim_orchestrator/src/dispatcher.rs - DispatchTask enum: TimeTask(agent_name, schedule), IssueTask(agent_name, issue_id, priority) - DispatchQueue: priority queue backed by BinaryHeap with round-robin fairness - ConcurrencyController: semaphore-based with starvation timeout - Methods: submit(task), next() -> Option<DispatchTask>, active_count(), is_full() - Fairness: alternates between time and issue tasks at equal priority - Tests: submit/dequeue, priority ordering, concurrency limits, fairness Issue #10: Issue mode controller - Create crates/terraphim_orchestrator/src/issue_mode.rs - IssueMode struct using terraphim_tracker::GiteaTracker to poll for issues - Poll loop every poll_interval_secs fetching ready issues via PageRank sorting - Filter out blocked issues and already-running tasks - Map issues to agents based on labels ([ADF] -> implementation-swarm) or title patterns - Submit IssueTask to DispatchQueue - Tests: poll cycle, issue-to-agent mapping, priority calculation, blocked issue filtering Issue #11: Time mode refactor - Refactor crates/terraphim_orchestrator/src/scheduler.rs - Extract TimeMode struct wrapping existing cron scheduler - TimeMode submits TimeTask to DispatchQueue instead of spawning directly (when configured) - Maintain backward compatibility: legacy mode spawns directly if no WorkflowConfig - Tests: TimeMode submits to queue, legacy mode still works Refs #8 #9 #10 #11
- Add ModeCoordinator that manages TimeMode and IssueMode - Implement unified shutdown: signal modes, drain queue, wait for active tasks - Extend spawner integration to dispatch tasks from queue to agents - Add stall detection: log warning when queue exceeds threshold - Add comprehensive tests for dual mode, shutdown coordination, and stall detection Refs #12
Create comprehensive E2E test suite in tests/e2e_tests.rs: - test_dual_mode_operation: verify both time and issue tasks processed - test_time_mode_only: legacy config compatibility - test_issue_mode_only: issue-only config verification - test_fairness_under_load: no starvation between task types - test_graceful_shutdown: clean termination with queue draining - test_stall_detection: warning logged when queue exceeds threshold - Additional tests for concurrency limits, prioritization, and backward compatibility Uses mock tracker and avoids real API calls. Refs #13
- Create src/compat.rs: Symphony compatibility layer with type aliases, adapters - Add migration helpers for enabling dual mode from legacy configs - Create MIGRATION.md with comprehensive migration guide - Update CLAUDE.md with dual mode architecture description - Include backward compatibility notes and configuration examples - Export compat module from lib.rs Refs #14
- Run cargo test --workspace (orchestrator tests pass) - Fix all warnings in orchestrator crate - Create CHANGELOG.md entry for v1.9.0 release - Verify backward compatibility with test_backward_compatibility - Build release binary: cargo build --release -p terraphim_orchestrator - All 142 tests pass in orchestrator crate Release includes: - Dual mode orchestrator with TimeMode and IssueMode - Unified dispatch queue with fairness - Stall detection and graceful shutdown - Symphony compatibility layer - Comprehensive E2E test suite - Migration documentation Refs #15
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
opencode/Zen prefix at runtime (Fixes Fixed APIs for create article from server and search #31)Stats
ADRs
Test plan
cargo test -p terraphim_spawner-- 92 unit tests passcargo clippy -p terraphim_spawner-- clean