fix: auto-add missing placeholders to LLM judge prompts #1207

Chibionos · 2026-01-26T23:06:34Z

Summary

This PR fixes the validation error that occurs when {{ActualOutput}} or {{ExpectedOutput}} placeholders are missing from LLM judge evaluator prompts. Instead of raising an error, we now automatically append the missing placeholders in clearly marked sections at the end of the prompt.

Problem

Users were encountering this error:

ValueError: Failed to create evaluator from file 'evals/evaluators/evaluator-xxx.json': 
1 validation error for LegacyLlmAsAJudgeEvaluator
prompt
  Value error, Prompt must contain both {ActualOutput} and {ExpectedOutput} placeholders

Solution

Modified the validation logic in both evaluator implementations to be more user-friendly:

Behavior

✅ Both placeholders present: No changes made to the prompt
✅ One placeholder missing: Automatically append only the missing placeholder in a new section
✅ Both placeholders missing: Automatically append both placeholders in separate sections

Example

If a prompt is missing {{ExpectedOutput}}, it will be transformed from:

Evaluate this output: {{ActualOutput}}

To:

Evaluate this output: {{ActualOutput}}

## Expected Output
{{ExpectedOutput}}

Changes

Modified src/uipath/eval/evaluators/legacy_llm_as_judge_evaluator.py
- Changed validate_prompt_placeholders from raising ValueError to auto-fixing prompt
Modified src/uipath/eval/evaluators/llm_as_judge_evaluator.py
- Changed validate_prompt_placeholders from raising UiPathEvaluationError to auto-fixing prompt

Benefits

✅ Better UX: No more confusing validation errors
✅ Backward compatible: Prompts with both placeholders work exactly as before
✅ Fail-safe: Ensures evaluators always have access to required data
✅ Clear sections: Missing placeholders are added with descriptive headers

Testing

The fix can be tested with evaluator configs that have:

Both placeholders present (should work as before)
Only {{ActualOutput}} present ({{ExpectedOutput}} should be auto-added)
Only {{ExpectedOutput}} present ({{ActualOutput}} should be auto-added)
Neither placeholder present (both should be auto-added)

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

Chibionos · 2026-01-26T23:10:40Z

Enhancement: XML Tags for Better Parsing

Added XML-style opening and closing tags around the auto-added placeholders to help the LLM clearly distinguish between outputs, especially when dealing with large JSON objects.

Updated Format

When missing placeholders are auto-added, they now include tags:

## Actual Output
<ActualOutput>
{{ActualOutput}}
</ActualOutput>

## Expected Output
<ExpectedOutput>
{{ExpectedOutput}}
</ExpectedOutput>

Benefits

✅ Clear boundaries: LLM can easily identify where each output starts and ends
✅ JSON-friendly: Works well with complex nested JSON structures
✅ Parseable: XML-style tags are familiar to LLMs and easy to parse
✅ Consistent: Same format whether one or both placeholders are missing

Example with Large JSON

Before tags, it would be unclear where one output ends:

## Expected Output
{{ExpectedOutput}}
{"complex": {"nested": {"json": "here"}}, "more": "data"}

With tags, boundaries are crystal clear:

## Expected Output
<ExpectedOutput>
{"complex": {"nested": {"json": "here"}}, "more": "data"}
</ExpectedOutput>

This makes it much easier for the LLM to correctly parse and compare the actual vs expected outputs.

Instead of raising a validation error when {{ActualOutput}} or {{ExpectedOutput}} placeholders are missing from LLM judge prompts, automatically append them in clearly marked sections at the end. Changes: - Modified legacy_llm_as_judge_evaluator.py to auto-add missing placeholders - Modified llm_as_judge_evaluator.py to auto-add missing placeholders - If both placeholders present: no changes made - If one placeholder missing: append only the missing one - If both placeholders missing: append both in separate sections This provides a better user experience by being more forgiving while ensuring the evaluator still has access to all required data. Fixes validation error: "Prompt must contain both {ActualOutput} and {ExpectedOutput} placeholders" 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Enhanced the auto-added placeholder sections with XML-style opening and closing tags to help LLMs clearly distinguish where each output starts and ends, especially when dealing with large JSON objects. Changes: - Added <ActualOutput>...</ActualOutput> tags around {{ActualOutput}} - Added <ExpectedOutput>...</ExpectedOutput> tags around {{ExpectedOutput}} - Updated docstrings to reflect tag addition Example output when missing placeholder is auto-added: ## Expected Output <ExpectedOutput> {{ExpectedOutput}} </ExpectedOutput> This makes it much easier for the LLM to parse and understand the boundaries of each section, particularly with complex nested JSON data. 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…ition - Add test coverage for legacy LLM judge evaluator placeholder validation - Add test coverage for modern LLM judge mixin placeholder validation - Tests verify XML tag structure and proper placeholder appending behavior - Covers all scenarios: both present, one missing, both missing, multiline prompts

Fixed validation errors in legacy LLM judge placeholder tests: - Changed category from string "AI" to LegacyEvaluatorCategory.LlmAsAJudge enum - Changed evaluator_type from string "LLMAsJudge" to LegacyEvaluatorType.Custom enum - Added proper enum imports from uipath.eval.models.models All 18 tests now pass (9 legacy + 9 modern evaluator tests). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fixed mypy errors by using the correct initialization pattern: - Added EvaluatorBaseParams helper function to create base parameters - Use model_dump() to spread parameters when creating evaluator instances - Added missing target_output_key parameter - Removed direct name/description parameters (they come from config) This follows the same pattern used in test_legacy_exact_match_evaluator.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Jan 26, 2026

Chibionos force-pushed the fix/auto-add-llm-judge-placeholders branch 2 times, most recently from 113011b to 55e1a64 Compare January 26, 2026 23:27

Chibi Vikram and others added 5 commits January 26, 2026 15:37

Chibionos force-pushed the fix/auto-add-llm-judge-placeholders branch from 773e44c to d98a6c2 Compare January 26, 2026 23:37

mjnovice approved these changes Jan 27, 2026

View reviewed changes

Chibionos merged commit ef7e5af into main Jan 27, 2026
92 checks passed

Chibionos deleted the fix/auto-add-llm-judge-placeholders branch January 27, 2026 00:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: auto-add missing placeholders to LLM judge prompts #1207

fix: auto-add missing placeholders to LLM judge prompts #1207

Chibionos commented Jan 26, 2026

Uh oh!

Chibionos commented Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: auto-add missing placeholders to LLM judge prompts #1207

fix: auto-add missing placeholders to LLM judge prompts #1207

Conversation

Chibionos commented Jan 26, 2026

Summary

Problem

Solution

Behavior

Example

Changes

Benefits

Testing

Uh oh!

Chibionos commented Jan 26, 2026

Enhancement: XML Tags for Better Parsing

Updated Format

Benefits

Example with Large JSON

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants