Conversation
95af585 to
99ab63b
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new remote seed dataset loader for the SALAD-Bench HuggingFace dataset, making it available through PyRIT’s automatic SeedDatasetProvider discovery and documenting it in the dataset-loading guide.
Changes:
- Added
_SaladBenchDatasetremote loader that fetches SALAD-Bench from HuggingFace and converts rows intoSeedPrompts. - Registered the loader for auto-discovery via
pyrit.datasets.seed_datasets.remote.__init__. - Added unit tests and updated the “Loading Built-in Datasets” notebook to show the new dataset name.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
pyrit/datasets/seed_datasets/remote/salad_bench_dataset.py |
New HuggingFace-backed loader that maps SALAD-Bench entries into SeedDataset/SeedPrompt. |
pyrit/datasets/seed_datasets/remote/__init__.py |
Imports/exports _SaladBenchDataset so it’s registered and discoverable. |
tests/unit/datasets/test_salad_bench_dataset.py |
Unit tests validating dataset fetching and config passthrough behavior. |
doc/code/datasets/1_loading_datasets.ipynb |
Documentation notebook updated to reflect the new dataset in the available list (but currently includes executed outputs/metadata). |
Comments suppressed due to low confidence (1)
pyrit/datasets/seed_datasets/remote/salad_bench_dataset.py:74
- The
authorslist formatting is inconsistent with other remote dataset loaders and is hard to read (and likely exceeds the repo’s 120-char line length). Please format the authors list across multiple lines (one author per line) like other dataset loaders for readability and consistent styling.
dataset_name=self.hf_dataset_name,
config=self.config,
| "execution_count": 1, | ||
| "id": "bf7e4f32", | ||
| "metadata": { | ||
| "execution": { | ||
| "iopub.execute_input": "2026-03-01T14:32:27.367046Z", | ||
| "iopub.status.busy": "2026-03-01T14:32:27.366836Z", | ||
| "iopub.status.idle": "2026-03-01T14:32:31.994319Z", | ||
| "shell.execute_reply": "2026-03-01T14:32:31.993825Z" | ||
| } | ||
| }, | ||
| "outputs": [ |
There was a problem hiding this comment.
This notebook now contains executed state (non-null execution_count, rich outputs, and execution timestamps). Please clear outputs and reset execution_count to null before committing so docs remain deterministic and don’t include run-specific noise.
| "Found default environment files: ['C:\\\\Users\\\\romanlutz\\\\.pyrit\\\\.env', 'C:\\\\Users\\\\romanlutz\\\\.pyrit\\\\.env.local']\n", | ||
| "Loaded environment file: C:\\Users\\romanlutz\\.pyrit\\.env\n", | ||
| "Loaded environment file: C:\\Users\\romanlutz\\.pyrit\\.env.local\n" |
There was a problem hiding this comment.
The committed notebook output includes user/machine-specific local paths (e.g., C:\\Users\\... and .env locations). Please remove these outputs/paths (by clearing cell outputs) to avoid leaking local environment details into the repo.
Add remote dataset loader for SALAD-Bench (walledai/SaladBench), a hierarchical safety benchmark with ~30k prompts organized into 6 domains, 16 tasks, and 65+ categories (ACL 2024). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
99ab63b to
7db0e9c
Compare
Add remote dataset loader for SALAD-Bench (walledai/SaladBench), a hierarchical safety benchmark with ~30k prompts organized into 6 domains, 16 tasks, and 65+ categories (ACL 2024).