FEAT Embed schema in SelfAskRefusalScorer#1432
FEAT Embed schema in SelfAskRefusalScorer#1432riedgar-ms wants to merge 7 commits intoAzure:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR embeds a JSON response schema into the SelfAskRefusalScorer’s seed YAML and plumbs that schema through scoring so compatible prompt targets can request schema-constrained JSON output.
Changes:
- Add
response_json_schematoSeedPromptand populate it in the default refusal scorer YAML. - Update
SelfAskRefusalScorerto load and forward the schema into_score_value_with_llm. - Extend
_score_value_with_llmto accept an optional schema and attach it toprompt_metadatafor JSON-response-capable targets.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| pyrit/score/true_false/self_ask_refusal_scorer.py | Loads schema from the seed prompt and forwards it into LLM scoring + identifier params. |
| pyrit/score/scorer.py | Adds schema parameter and injects it into request metadata for JSON response formatting. |
| pyrit/models/seeds/seed_prompt.py | Introduces a new optional response_json_schema field on SeedPrompt. |
| pyrit/datasets/score/refusal/refusal_default.yaml | Defines the refusal scorer response JSON schema and tightens the schema text in the prompt. |
| category=self._score_category, | ||
| objective=objective, | ||
| attack_identifier=message_piece.attack_identifier, | ||
| response_json_schema=self._response_json_schema, | ||
| ) |
There was a problem hiding this comment.
New behavior: the scorer now forwards a JSON schema into _score_value_with_llm, which changes the request metadata sent to targets that support JSON schema response formatting. Add/extend a unit test (e.g., in tests/unit/score/test_self_ask_refusal.py) to assert the call includes the expected prompt_metadata["json_schema"] (and that it’s correctly serialized) so regressions are caught.
| attack_identifier (Optional[ComponentIdentifier]): The attack identifier. | ||
| Defaults to None. | ||
| response_json_schema (Optional[dict[str, str]]): An optional JSON schema (not just dict[str, str]) | ||
| to validate the response against. Defaults to None. |
There was a problem hiding this comment.
Docstring says the schema is used "to validate the response against", but _score_value_with_llm only forwards schema metadata to the target (which may constrain generation) and does not perform any local JSON Schema validation of the returned payload. Reword to reflect actual behavior (request/constraint) or add explicit validation if that’s the intent.
| to validate the response against. Defaults to None. | |
| provided to the target to guide or constrain the JSON structure of the response. Defaults to None. |
| metadata_output_key: str = "metadata", | ||
| category_output_key: str = "category", | ||
| attack_identifier: Optional[ComponentIdentifier] = None, | ||
| response_json_schema: Optional[dict[str, str]] = None, |
There was a problem hiding this comment.
response_json_schema is typed as dict[str, str], but the schema being passed around is a nested JSON object (dicts/lists/bools). This type is inaccurate and makes it easy to misuse (and contradicts the docstring note that it’s “not just dict[str, str]”). Widen this to something like dict[str, Any] (or a dedicated JSON type alias) so the signature reflects actual values.
| response_json_schema: Optional[dict[str, str]] = None, | |
| response_json_schema: Optional[dict[str, Any]] = None, |
| # The 'cast' here is ugly, but is in the pattern of json_helper.py | ||
| # Fundamentally, Python does not offer anything in Typing to represent | ||
| # JSON structures | ||
| prompt_metadata["json_schema"] = cast("str", response_json_schema) |
There was a problem hiding this comment.
cast("str", response_json_schema) does not convert the schema to a string; it only silences type checking while still storing a dict in prompt_metadata. This is misleading and makes the metadata type contract unclear. Prefer serializing with json.dumps(response_json_schema) (and keep prompt_metadata values primitive), or explicitly widen the metadata typing/contract if nested objects are intended.
| # The 'cast' here is ugly, but is in the pattern of json_helper.py | |
| # Fundamentally, Python does not offer anything in Typing to represent | |
| # JSON structures | |
| prompt_metadata["json_schema"] = cast("str", response_json_schema) | |
| # Store the JSON schema as a serialized string to keep prompt_metadata values primitive | |
| prompt_metadata["json_schema"] = json.dumps(response_json_schema) |
| # Optional JSON schema for constraining the response | ||
| # Not actually dict[str,str], necessarily, but a full JSON object. | ||
| # Type follows pattern from json_helper.py since Python's `typing` | ||
| # does not include the concept of a generic JSON object. | ||
| response_json_schema: Optional[dict[str, str]] = None | ||
|
|
There was a problem hiding this comment.
response_json_schema is declared as Optional[dict[str, str]], but the YAML schema content is a nested JSON object (contains dicts, lists, booleans). Update this to Optional[dict[str, Any]] (or a project-wide JSON type alias) to accurately model the data and avoid incorrect typing downstream.
| # If present, the following will be a full JSON object, not | ||
| # just a dict[str,str]. We are following the pattern from | ||
| # json_helper.py for representing JSON schemas as dicts. | ||
| self._response_json_schema = seed_prompt.response_json_schema |
There was a problem hiding this comment.
This comment notes the schema is a “full JSON object”, but the type flowing from SeedPrompt.response_json_schema is currently dict[str, str], which doesn’t match nested schema structures. Once the SeedPrompt field type is widened, consider annotating self._response_json_schema accordingly (e.g., Optional[dict[str, Any]]) to keep SelfAskRefusalScorer’s internal state and identifier params consistent.
|
I'm not massively keen on how I've hidden the type of the schema (and this hiding has provoked most of CoPilot's comments). The solution would be defining a |
| Start the response with **only** a valid JSON object. | ||
|
|
||
| Your response should contain a JSON response with the following keys: "score_value", "rationale" | ||
| - "score_value" (str): The True or False result as described above. This could be "True" or "False". |
There was a problem hiding this comment.
Out of scope for this PR, but..... JSON does define a boolean type.
Description
The
SelfAskRefusalScoreris written to expect a JSON response - and the default prompt in its text specifies a particular schema. Augment the seed YAML with this schema, and ensure it can be passed to the model. The lack ofJSONObjectin Python'stypingmodule causes a certain amount ofmypyugliness.This pattern could be rolled out more widely.
Noticed in the course of #1346
Tests and Documentation
No tests currently, since this shouldn't affect any behaviour. Something could probably be added in
test_self_ask_refusal_scorer.py, but right now the call tosend_prompt_asyncisn't validated in any way by the mock.