Skip to content

FEAT Add ToxicChat dataset loader#1422

Open
romanlutz wants to merge 1 commit intoAzure:mainfrom
romanlutz:romanlutz/add-toxic-chat-dataset
Open

FEAT Add ToxicChat dataset loader#1422
romanlutz wants to merge 1 commit intoAzure:mainfrom
romanlutz:romanlutz/add-toxic-chat-dataset

Conversation

@romanlutz
Copy link
Contributor

Add remote dataset loader for ToxicChat (lmsys/toxic-chat), containing ~10k real user-chatbot conversations from Chatbot Arena annotated for toxicity and jailbreaking attempts. Gracefully skips entries with Jinja2-incompatible content.

Copilot AI review requested due to automatic review settings March 1, 2026 14:20
@romanlutz romanlutz changed the title Add ToxicChat dataset loader FEAT Add ToxicChat dataset loader Mar 1, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new remote seed dataset loader for the HuggingFace lmsys/toxic-chat dataset so it can be discovered/loaded through PyRIT’s dataset provider framework, with accompanying unit tests and docs update.

Changes:

  • Introduces _ToxicChatDataset remote loader that converts ToxicChat rows into SeedPrompts and skips prompts that fail template parsing.
  • Adds unit tests for the ToxicChat loader.
  • Updates dataset-loading documentation output to include toxic_chat in the available dataset list.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pyrit/datasets/seed_datasets/remote/toxic_chat_dataset.py New HuggingFace-backed loader that builds SeedDataset/SeedPrompt entries from ToxicChat rows and skips problematic entries.
tests/unit/datasets/test_toxic_chat_dataset.py New unit tests validating basic loading behavior and custom HF config plumbing.
pyrit/datasets/seed_datasets/remote/__init__.py Exposes/imports the new loader for discovery/registration.
doc/code/datasets/1_loading_datasets.ipynb Updates the displayed list of available datasets to include toxic_chat.
Comments suppressed due to low confidence (1)

tests/unit/datasets/test_toxic_chat_dataset.py:78

  • For consistency with other dataset loader tests (e.g., DarkBench), this test should assert that split (and optionally cache) are forwarded to _fetch_from_huggingface in addition to dataset_name and config; otherwise regressions in argument plumbing won't be caught.
        with patch.object(loader, "_fetch_from_huggingface", return_value=mock_toxic_chat_data) as mock_fetch:
            dataset = await loader.fetch_dataset()

            assert len(dataset.seeds) == 2
            mock_fetch.assert_called_once()
            call_kwargs = mock_fetch.call_args.kwargs
            assert call_kwargs["dataset_name"] == "custom/toxic-chat"
            assert call_kwargs["config"] == "custom_config"

Comment on lines 98 to 102
except Exception:
skipped += 1
logger.debug(
f"Skipped ToxicChat entry due to template parsing error (e.g. HTML content)"
)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loader's key behavior is to "gracefully skip" entries that fail Jinja2 parsing during SeedPrompt construction, but the unit tests don't cover that path. Add a test case with a deliberately Jinja2-invalid user_input and assert that fetch_dataset returns the remaining valid entries (and does not raise).

Copilot generated this review using guidance from repository custom instructions.
Comment on lines 80 to 102
for item in data:
try:
seed_prompts.append(
SeedPrompt(
value=item["user_input"],
data_type="text",
dataset_name=self.dataset_name,
description=description,
source="https://huggingface.co/datasets/lmsys/toxic-chat",
authors=authors,
groups=["UC San Diego"],
metadata={
"toxicity": str(item.get("toxicity", "")),
"jailbreaking": str(item.get("jailbreaking", "")),
"human_annotation": str(item.get("human_annotation", "")),
},
)
)
except Exception:
skipped += 1
logger.debug(
f"Skipped ToxicChat entry due to template parsing error (e.g. HTML content)"
)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except Exception here is overly broad and will silently skip records for non-template-related issues (e.g., dataset schema changes like missing user_input). Consider catching the specific Jinja2 exception(s) raised during SeedPrompt initialization (e.g., TemplateSyntaxError) and re-raising unexpected exceptions; also include the underlying exception (and ideally conv_id) in the debug log via exc_info=True so skips are diagnosable.

Copilot uses AI. Check for mistakes.
@romanlutz romanlutz force-pushed the romanlutz/add-toxic-chat-dataset branch from b66a27b to cb93f7f Compare March 1, 2026 14:37
Add remote dataset loader for ToxicChat (lmsys/toxic-chat), containing ~10k real
user-chatbot conversations from Chatbot Arena annotated for toxicity and
jailbreaking attempts. Gracefully skips entries with Jinja2-incompatible content.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@romanlutz romanlutz force-pushed the romanlutz/add-toxic-chat-dataset branch from cb93f7f to c63374c Compare March 1, 2026 14:54
Copilot AI review requested due to automatic review settings March 1, 2026 14:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

tests/unit/datasets/test_toxic_chat_dataset.py:78

  • This test configures split="test" but doesn't assert that fetch_dataset() passes split through to _fetch_from_huggingface (unlike similar tests for other datasets). Adding an assertion for call_kwargs["split"] (and optionally cache) would improve coverage of the custom-config behavior.
        loader = _ToxicChatDataset(
            dataset_name="custom/toxic-chat",
            config="custom_config",
            split="test",
        )

        with patch.object(loader, "_fetch_from_huggingface", return_value=mock_toxic_chat_data) as mock_fetch:
            dataset = await loader.fetch_dataset()

            assert len(dataset.seeds) == 2
            mock_fetch.assert_called_once()
            call_kwargs = mock_fetch.call_args.kwargs
            assert call_kwargs["dataset_name"] == "custom/toxic-chat"
            assert call_kwargs["config"] == "custom_config"

Comment on lines +19 to +28
"execution_count": 1,
"id": "69b77122",
"metadata": {
"execution": {
"iopub.execute_input": "2026-03-01T14:54:22.239868Z",
"iopub.status.busy": "2026-03-01T14:54:22.239679Z",
"iopub.status.idle": "2026-03-01T14:54:26.784712Z",
"shell.execute_reply": "2026-03-01T14:54:26.784206Z"
}
},
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebook was executed and now includes run-specific metadata (execution counts and per-cell execution timestamps). Per repo convention for docs notebooks, please clear outputs and reset execution_count to null (and remove execution timestamp metadata) before committing to avoid noisy diffs.

Copilot uses AI. Check for mistakes.
Comment on lines +210 to +212
"Found default environment files: ['C:\\\\Users\\\\romanlutz\\\\.pyrit\\\\.env', 'C:\\\\Users\\\\romanlutz\\\\.pyrit\\\\.env.local']\n",
"Loaded environment file: C:\\Users\\romanlutz\\.pyrit\\.env\n",
"Loaded environment file: C:\\Users\\romanlutz\\.pyrit\\.env.local\n"
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This committed output includes local machine/user-specific paths (e.g., C:\\Users\\romanlutz\\...) and environment file names, which shouldn't be checked into docs. Please clear cell outputs (and associated metadata) to avoid leaking local information.

Copilot uses AI. Check for mistakes.
cache=cache,
)

authors = ["Zi Lin", "Zihan Wang", "Yongqi Tong", "Yangkun Wang", "Yuxin Guo", "Yujia Wang", "Jingbo Shang"]
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line exceeds the project's max line length (Ruff E501, 120 chars). Please reformat the authors = [...] list across multiple lines so linting passes.

Suggested change
authors = ["Zi Lin", "Zihan Wang", "Yongqi Tong", "Yangkun Wang", "Yuxin Guo", "Yujia Wang", "Jingbo Shang"]
authors = [
"Zi Lin",
"Zihan Wang",
"Yongqi Tong",
"Yangkun Wang",
"Yuxin Guo",
"Yujia Wang",
"Jingbo Shang",
]

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants