FEAT Add SimpleSafetyTests dataset loader by romanlutz · Pull Request #1426 · Azure/PyRIT

romanlutz · 2026-03-01T14:35:25Z

Add remote dataset loader for SimpleSafetyTests (Bertievidgen/SimpleSafetyTests), a lightweight diagnostic set of 100 critical safety test prompts for quickly evaluating the most basic safety properties of LLMs.

Copilot

Pull request overview

Adds a new remote seed dataset loader for HuggingFace dataset Bertievidgen/SimpleSafetyTests so it can be discovered and fetched via PyRIT’s SeedDatasetProvider mechanism.

Changes:

Introduces _SimpleSafetyTestsDataset remote loader that fetches prompts from HuggingFace and converts them into SeedPrompts.
Adds unit tests covering the new loader’s fetch_dataset behavior and dataset_name.
Updates dataset-loading documentation notebook output to include simple_safety_tests in the available dataset list.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
`pyrit/datasets/seed_datasets/remote/simple_safety_tests_dataset.py`	New HuggingFace-backed dataset loader for SimpleSafetyTests.
`pyrit/datasets/seed_datasets/remote/__init__.py`	Exposes the new loader so it registers/discovers with other remote providers.
`tests/unit/datasets/test_simple_safety_tests_dataset.py`	Adds unit tests for the new loader.
`doc/code/datasets/1_loading_datasets.ipynb`	Updates the “Loading Built-in Datasets” notebook to reflect the new dataset name, but also includes newly-executed metadata/outputs.

Copilot · 2026-03-01T14:41:20Z

pyrit/datasets/seed_datasets/remote/simple_safety_tests_dataset.py

+        authors = ["Bertie Vidgen", "Nino Scherrer", "Hannah Rose Kirk", "Rebecca Qian",
+                    "Anand Kannappan", "Scott Niekum", "Paul Röttger"]


The authors list formatting is inconsistent with the rest of the remote dataset loaders and makes it harder to read/maintain. Consider formatting it as a multi-line list with one author per line (as done in other loaders) to keep style consistent.

Suggested change

authors = ["Bertie Vidgen", "Nino Scherrer", "Hannah Rose Kirk", "Rebecca Qian",

"Anand Kannappan", "Scott Niekum", "Paul Röttger"]

authors = [

"Bertie Vidgen",

"Nino Scherrer",

"Hannah Rose Kirk",

"Rebecca Qian",

"Anand Kannappan",

"Scott Niekum",

"Paul Röttger",

]

Copilot · 2026-03-01T14:41:20Z

doc/code/datasets/1_loading_datasets.ipynb

   "cell_type": "code",
-   "execution_count": null,
-   "id": "1",
-   "metadata": {},
+   "execution_count": 1,
+   "id": "1c3a2497",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2026-03-01T14:35:40.418153Z",
+     "iopub.status.busy": "2026-03-01T14:35:40.417953Z",
+     "iopub.status.idle": "2026-03-01T14:35:45.183664Z",
+     "shell.execute_reply": "2026-03-01T14:35:45.183155Z"
+    }
+   },


This notebook now includes execution metadata (execution_count, iopub timestamps, etc.). Most notebooks in doc/code keep execution_count as null and omit per-run execution metadata to avoid noisy diffs and non-reproducible timestamps; consider clearing execution metadata before committing.

Copilot · 2026-03-01T14:41:20Z

doc/code/datasets/1_loading_datasets.ipynb

+      "Found default environment files: ['C:\\\\Users\\\\romanlutz\\\\.pyrit\\\\.env', 'C:\\\\Users\\\\romanlutz\\\\.pyrit\\\\.env.local']\n",
+      "Loaded environment file: C:\\Users\\romanlutz\\.pyrit\\.env\n",
+      "Loaded environment file: C:\\Users\\romanlutz\\.pyrit\\.env.local\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
-      "C:\\Users\\anasundar\\AppData\\Local\\Temp\\ipykernel_49000\\4021500943.py:10: DeprecationWarning: is_objective parameter is deprecated since 0.13.0. Use seed_type='objective' instead.\n",
+      "C:\\Users\\romanlutz\\AppData\\Local\\Temp\\ipykernel_10540\\4021500943.py:10: DeprecationWarning: is_objective parameter is deprecated since 0.13.0. Use seed_type='objective' instead.\n",
      "  memory.get_seeds(harm_categories=[\"illegal\"], is_objective=True)\n"
     ]
    },


The committed outputs include local machine paths/usernames (e.g., C:\\Users\\...) from environment loading. This can leak personal/workstation details into the repo; please clear these outputs (or replace them with sanitized, non-user-specific examples) before committing the notebook.

Copilot · 2026-03-01T14:41:20Z

pyrit/datasets/seed_datasets/remote/simple_safety_tests_dataset.py

+                source="https://huggingface.co/datasets/Bertievidgen/SimpleSafetyTests",
+                authors=authors,
+                groups=["Patronus AI", "University of Oxford", "Bocconi University"],
+                metadata={"category": item.get("category", "")},


groups is set to [item.get("category", "")], which will produce a list containing an empty string when the category is missing/blank. Other dataset loaders avoid this by using an empty list when the category is falsy (e.g., groups=[category] if category else []), which prevents polluting group filters/queries with "".

Suggested change

metadata={"category": item.get("category", "")},

metadata={"category": item.get("category")},

Add remote dataset loader for SimpleSafetyTests (Bertievidgen/SimpleSafetyTests), a lightweight diagnostic set of 100 critical safety test prompts for quickly evaluating the most basic safety properties of LLMs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 1, 2026 14:35

Copilot started reviewing on behalf of romanlutz March 1, 2026 14:35 View session

romanlutz force-pushed the romanlutz/add-simple-safety-tests-dataset branch from 7a32aee to 0b9827a Compare March 1, 2026 14:35

Copilot AI reviewed Mar 1, 2026

View reviewed changes

romanlutz force-pushed the romanlutz/add-simple-safety-tests-dataset branch from 0b9827a to 1efaa09 Compare March 1, 2026 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT Add SimpleSafetyTests dataset loader#1426

FEAT Add SimpleSafetyTests dataset loader#1426
romanlutz wants to merge 1 commit intoAzure:mainfrom
romanlutz:romanlutz/add-simple-safety-tests-dataset

romanlutz commented Mar 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 1, 2026

Uh oh!

Copilot AI Mar 1, 2026

Uh oh!

Copilot AI Mar 1, 2026

Uh oh!

Copilot AI Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		authors = ["Bertie Vidgen", "Nino Scherrer", "Hannah Rose Kirk", "Rebecca Qian",
		"Anand Kannappan", "Scott Niekum", "Paul Röttger"]

	metadata={"category": item.get("category", "")},
	metadata={"category": item.get("category")},

Conversation

romanlutz commented Mar 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants