-
Notifications
You must be signed in to change notification settings - Fork 21
fix: auto-add missing placeholders to LLM judge prompts #1207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Enhancement: XML Tags for Better ParsingAdded XML-style opening and closing tags around the auto-added placeholders to help the LLM clearly distinguish between outputs, especially when dealing with large JSON objects. Updated FormatWhen missing placeholders are auto-added, they now include tags: ## Actual Output
<ActualOutput>
{{ActualOutput}}
</ActualOutput>
## Expected Output
<ExpectedOutput>
{{ExpectedOutput}}
</ExpectedOutput>Benefits
Example with Large JSONBefore tags, it would be unclear where one output ends: With tags, boundaries are crystal clear: This makes it much easier for the LLM to correctly parse and compare the actual vs expected outputs. |
113011b to
55e1a64
Compare
Instead of raising a validation error when {{ActualOutput}} or
{{ExpectedOutput}} placeholders are missing from LLM judge prompts,
automatically append them in clearly marked sections at the end.
Changes:
- Modified legacy_llm_as_judge_evaluator.py to auto-add missing placeholders
- Modified llm_as_judge_evaluator.py to auto-add missing placeholders
- If both placeholders present: no changes made
- If one placeholder missing: append only the missing one
- If both placeholders missing: append both in separate sections
This provides a better user experience by being more forgiving while
ensuring the evaluator still has access to all required data.
Fixes validation error: "Prompt must contain both {ActualOutput} and
{ExpectedOutput} placeholders"
🤖 Generated with Claude Code (https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Enhanced the auto-added placeholder sections with XML-style opening and
closing tags to help LLMs clearly distinguish where each output starts
and ends, especially when dealing with large JSON objects.
Changes:
- Added <ActualOutput>...</ActualOutput> tags around {{ActualOutput}}
- Added <ExpectedOutput>...</ExpectedOutput> tags around {{ExpectedOutput}}
- Updated docstrings to reflect tag addition
Example output when missing placeholder is auto-added:
## Expected Output
<ExpectedOutput>
{{ExpectedOutput}}
</ExpectedOutput>
This makes it much easier for the LLM to parse and understand the
boundaries of each section, particularly with complex nested JSON data.
🤖 Generated with Claude Code (https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ition - Add test coverage for legacy LLM judge evaluator placeholder validation - Add test coverage for modern LLM judge mixin placeholder validation - Tests verify XML tag structure and proper placeholder appending behavior - Covers all scenarios: both present, one missing, both missing, multiline prompts
Fixed validation errors in legacy LLM judge placeholder tests: - Changed category from string "AI" to LegacyEvaluatorCategory.LlmAsAJudge enum - Changed evaluator_type from string "LLMAsJudge" to LegacyEvaluatorType.Custom enum - Added proper enum imports from uipath.eval.models.models All 18 tests now pass (9 legacy + 9 modern evaluator tests). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed mypy errors by using the correct initialization pattern: - Added EvaluatorBaseParams helper function to create base parameters - Use model_dump() to spread parameters when creating evaluator instances - Added missing target_output_key parameter - Removed direct name/description parameters (they come from config) This follows the same pattern used in test_legacy_exact_match_evaluator.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
773e44c to
d98a6c2
Compare
Summary
This PR fixes the validation error that occurs when {{ActualOutput}} or {{ExpectedOutput}} placeholders are missing from LLM judge evaluator prompts. Instead of raising an error, we now automatically append the missing placeholders in clearly marked sections at the end of the prompt.
Problem
Users were encountering this error:
Solution
Modified the validation logic in both evaluator implementations to be more user-friendly:
Behavior
Example
If a prompt is missing {{ExpectedOutput}}, it will be transformed from:
To:
Changes
src/uipath/eval/evaluators/legacy_llm_as_judge_evaluator.pyvalidate_prompt_placeholdersfrom raising ValueError to auto-fixing promptsrc/uipath/eval/evaluators/llm_as_judge_evaluator.pyvalidate_prompt_placeholdersfrom raising UiPathEvaluationError to auto-fixing promptBenefits
Testing
The fix can be tested with evaluator configs that have:
🤖 Generated with Claude Code
Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com