-
Notifications
You must be signed in to change notification settings - Fork 680
FEAT Embed schema in SelfAskRefusalScorer #1432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
3f82c33
eb53df6
acf2ece
df76717
6d23793
1926fd6
07ae95f
47df3a0
018107c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -492,6 +492,7 @@ async def _score_value_with_llm( | |||||||||||||
| metadata_output_key: str = "metadata", | ||||||||||||||
| category_output_key: str = "category", | ||||||||||||||
| attack_identifier: Optional[ComponentIdentifier] = None, | ||||||||||||||
| response_json_schema: Optional[dict[str, str]] = None, | ||||||||||||||
|
||||||||||||||
| response_json_schema: Optional[dict[str, str]] = None, | |
| response_json_schema: Optional[dict[str, Any]] = None, |
Copilot
AI
Mar 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring says the schema is used "to validate the response against", but _score_value_with_llm only forwards schema metadata to the target (which may constrain generation) and does not perform any local JSON Schema validation of the returned payload. Reword to reflect actual behavior (request/constraint) or add explicit validation if that’s the intent.
| to validate the response against. Defaults to None. | |
| provided to the target to guide or constrain the JSON structure of the response. Defaults to None. |
Copilot
AI
Mar 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast("str", response_json_schema) does not convert the schema to a string; it only silences type checking while still storing a dict in prompt_metadata. This is misleading and makes the metadata type contract unclear. Prefer serializing with json.dumps(response_json_schema) (and keep prompt_metadata values primitive), or explicitly widen the metadata typing/contract if nested objects are intended.
| # The 'cast' here is ugly, but is in the pattern of json_helper.py | |
| # Fundamentally, Python does not offer anything in Typing to represent | |
| # JSON structures | |
| prompt_metadata["json_schema"] = cast("str", response_json_schema) | |
| # Store the JSON schema as a serialized string to keep prompt_metadata values primitive | |
| prompt_metadata["json_schema"] = json.dumps(response_json_schema) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -103,7 +103,12 @@ def __init__( | |
| prompt_path = Path(refusal_system_prompt_path) | ||
|
|
||
| self._prompt_format_string = prompt_format_string or DEFAULT_REFUSAL_PROMPT_FORMAT | ||
| self._system_prompt = SeedPrompt.from_yaml_file(prompt_path).value | ||
| seed_prompt = SeedPrompt.from_yaml_file(prompt_path) | ||
| self._system_prompt = seed_prompt.value | ||
| # If present, the following will be a full JSON object, not | ||
| # just a dict[str,str]. We are following the pattern from | ||
| # json_helper.py for representing JSON schemas as dicts. | ||
| self._response_json_schema = seed_prompt.response_json_schema | ||
|
Comment on lines
+108
to
+111
|
||
| self._score_category = ["refusal"] | ||
|
|
||
| def _build_identifier(self) -> ComponentIdentifier: | ||
|
|
@@ -118,6 +123,7 @@ def _build_identifier(self) -> ComponentIdentifier: | |
| "system_prompt_template": self._system_prompt, | ||
| "user_prompt_template": self._prompt_format_string, | ||
| "score_aggregator": self._score_aggregator.__name__, | ||
| "response_json_schema": self._response_json_schema, | ||
| }, | ||
| children={ | ||
| "prompt_target": self._prompt_target.get_identifier(), | ||
|
|
@@ -182,6 +188,7 @@ async def _score_piece_async(self, message_piece: MessagePiece, *, objective: Op | |
| category=self._score_category, | ||
| objective=objective, | ||
| attack_identifier=message_piece.attack_identifier, | ||
| response_json_schema=self._response_json_schema, | ||
| ) | ||
|
Comment on lines
188
to
192
|
||
| score = unvalidated_score.to_score(score_value=unvalidated_score.raw_score_value, score_type="true_false") | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
response_json_schemais declared asOptional[dict[str, str]], but the YAML schema content is a nested JSON object (contains dicts, lists, booleans). Update this toOptional[dict[str, Any]](or a project-wide JSON type alias) to accurately model the data and avoid incorrect typing downstream.