Skip to content

Conversation

@Chibionos
Copy link
Contributor

Summary

This PR fixes the validation error that occurs when {{ActualOutput}} or {{ExpectedOutput}} placeholders are missing from LLM judge evaluator prompts. Instead of raising an error, we now automatically append the missing placeholders in clearly marked sections at the end of the prompt.

Problem

Users were encountering this error:

ValueError: Failed to create evaluator from file 'evals/evaluators/evaluator-xxx.json': 
1 validation error for LegacyLlmAsAJudgeEvaluator
prompt
  Value error, Prompt must contain both {ActualOutput} and {ExpectedOutput} placeholders

Solution

Modified the validation logic in both evaluator implementations to be more user-friendly:

Behavior

  • Both placeholders present: No changes made to the prompt
  • One placeholder missing: Automatically append only the missing placeholder in a new section
  • Both placeholders missing: Automatically append both placeholders in separate sections

Example

If a prompt is missing {{ExpectedOutput}}, it will be transformed from:

Evaluate this output: {{ActualOutput}}

To:

Evaluate this output: {{ActualOutput}}

## Expected Output
{{ExpectedOutput}}

Changes

  • Modified src/uipath/eval/evaluators/legacy_llm_as_judge_evaluator.py
    • Changed validate_prompt_placeholders from raising ValueError to auto-fixing prompt
  • Modified src/uipath/eval/evaluators/llm_as_judge_evaluator.py
    • Changed validate_prompt_placeholders from raising UiPathEvaluationError to auto-fixing prompt

Benefits

  • Better UX: No more confusing validation errors
  • Backward compatible: Prompts with both placeholders work exactly as before
  • Fail-safe: Ensures evaluators always have access to required data
  • Clear sections: Missing placeholders are added with descriptive headers

Testing

The fix can be tested with evaluator configs that have:

  1. Both placeholders present (should work as before)
  2. Only {{ActualOutput}} present ({{ExpectedOutput}} should be auto-added)
  3. Only {{ExpectedOutput}} present ({{ActualOutput}} should be auto-added)
  4. Neither placeholder present (both should be auto-added)

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

@github-actions github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Jan 26, 2026
@Chibionos
Copy link
Contributor Author

Enhancement: XML Tags for Better Parsing

Added XML-style opening and closing tags around the auto-added placeholders to help the LLM clearly distinguish between outputs, especially when dealing with large JSON objects.

Updated Format

When missing placeholders are auto-added, they now include tags:

## Actual Output
<ActualOutput>
{{ActualOutput}}
</ActualOutput>

## Expected Output
<ExpectedOutput>
{{ExpectedOutput}}
</ExpectedOutput>

Benefits

  • Clear boundaries: LLM can easily identify where each output starts and ends
  • JSON-friendly: Works well with complex nested JSON structures
  • Parseable: XML-style tags are familiar to LLMs and easy to parse
  • Consistent: Same format whether one or both placeholders are missing

Example with Large JSON

Before tags, it would be unclear where one output ends:

## Expected Output
{{ExpectedOutput}}
{"complex": {"nested": {"json": "here"}}, "more": "data"}

With tags, boundaries are crystal clear:

## Expected Output
<ExpectedOutput>
{"complex": {"nested": {"json": "here"}}, "more": "data"}
</ExpectedOutput>

This makes it much easier for the LLM to correctly parse and compare the actual vs expected outputs.

@Chibionos Chibionos force-pushed the fix/auto-add-llm-judge-placeholders branch 2 times, most recently from 113011b to 55e1a64 Compare January 26, 2026 23:27
Chibi Vikram and others added 5 commits January 26, 2026 15:37
Instead of raising a validation error when {{ActualOutput}} or
{{ExpectedOutput}} placeholders are missing from LLM judge prompts,
automatically append them in clearly marked sections at the end.

Changes:
- Modified legacy_llm_as_judge_evaluator.py to auto-add missing placeholders
- Modified llm_as_judge_evaluator.py to auto-add missing placeholders
- If both placeholders present: no changes made
- If one placeholder missing: append only the missing one
- If both placeholders missing: append both in separate sections

This provides a better user experience by being more forgiving while
ensuring the evaluator still has access to all required data.

Fixes validation error: "Prompt must contain both {ActualOutput} and
{ExpectedOutput} placeholders"

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Enhanced the auto-added placeholder sections with XML-style opening and
closing tags to help LLMs clearly distinguish where each output starts
and ends, especially when dealing with large JSON objects.

Changes:
- Added <ActualOutput>...</ActualOutput> tags around {{ActualOutput}}
- Added <ExpectedOutput>...</ExpectedOutput> tags around {{ExpectedOutput}}
- Updated docstrings to reflect tag addition

Example output when missing placeholder is auto-added:

## Expected Output
<ExpectedOutput>
{{ExpectedOutput}}
</ExpectedOutput>

This makes it much easier for the LLM to parse and understand the
boundaries of each section, particularly with complex nested JSON data.

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ition

- Add test coverage for legacy LLM judge evaluator placeholder validation
- Add test coverage for modern LLM judge mixin placeholder validation
- Tests verify XML tag structure and proper placeholder appending behavior
- Covers all scenarios: both present, one missing, both missing, multiline prompts
Fixed validation errors in legacy LLM judge placeholder tests:
- Changed category from string "AI" to LegacyEvaluatorCategory.LlmAsAJudge enum
- Changed evaluator_type from string "LLMAsJudge" to LegacyEvaluatorType.Custom enum
- Added proper enum imports from uipath.eval.models.models

All 18 tests now pass (9 legacy + 9 modern evaluator tests).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed mypy errors by using the correct initialization pattern:
- Added EvaluatorBaseParams helper function to create base parameters
- Use model_dump() to spread parameters when creating evaluator instances
- Added missing target_output_key parameter
- Removed direct name/description parameters (they come from config)

This follows the same pattern used in test_legacy_exact_match_evaluator.py

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@Chibionos Chibionos force-pushed the fix/auto-add-llm-judge-placeholders branch from 773e44c to d98a6c2 Compare January 26, 2026 23:37
@Chibionos Chibionos merged commit ef7e5af into main Jan 27, 2026
92 checks passed
@Chibionos Chibionos deleted the fix/auto-add-llm-judge-placeholders branch January 27, 2026 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants