Add LLM-as-judge task output scorer (#94/#59 foundation) by neoneye · Pull Request #226 · PlanExeOrg/PlanExe

neoneye · 2026-03-09T13:52:22Z

Summary

Adds scoring/task_output_scorer.py — LLM-as-judge that scores pipeline task outputs against a 5-dimension rubric (Specificity, Actionability, Completeness, Internal Consistency, Conciseness)
Adds scoring/score_run_task.py — CLI helper to score tasks from completed run directories
Foundation building block for autonomous prompt optimization (fix: define column_property checks after class body to avoid NotImple… #94) and A/B testing promotion (fix: correct hallucinated MCP tool names in OpenClaw skill and proposals #59)

Test plan

ast.parse() all new files
Verify imports succeed in project venv
derive_task_name() correctly parses task filenames
DEFAULT_WEIGHTS sum to 1.0
Manual test with a real LLM and run directory

🤖 Generated with Claude Code

Foundation for autonomous prompt optimization (#94) and A/B testing promotion (#59). Scores pipeline task outputs against a 5-dimension rubric (Specificity, Actionability, Completeness, Internal Consistency, Conciseness) using structured LLM output. Includes CLI helper for scoring tasks from completed run directories. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

neoneye mentioned this pull request Mar 9, 2026

Add A/B experiment runner and results tracker (#94/#59) #227

Open

5 tasks

neoneye deleted the branch main March 10, 2026 00:48

neoneye closed this Mar 10, 2026

neoneye reopened this Mar 10, 2026

neoneye changed the base branch from feature/plan-resume-tool to main March 10, 2026 01:44

neoneye mentioned this pull request Mar 11, 2026

Update roadmap status across proposals 111, 102, 109 #243

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LLM-as-judge task output scorer (#94/#59 foundation)#226

Add LLM-as-judge task output scorer (#94/#59 foundation)#226
neoneye wants to merge 1 commit intomainfrom
feature/94-task-output-scorer

neoneye commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

neoneye commented Mar 9, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant