Open
Conversation
Add remote dataset loader for OR-Bench (bench-llm/OR-Bench), an over-refusal benchmark that tests whether language models wrongly refuse safe prompts. Supports both or-bench-hard-1k and or-bench-toxic configurations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fea917e to
5b341d2
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new remote dataset loader for the HuggingFace OR-Bench benchmark (bench-llm/OR-Bench) so it can be discovered via SeedDatasetProvider and loaded as SeedDataset seeds, with support for both the or-bench-hard-1k and or-bench-toxic configurations.
Changes:
- Introduces
_ORBenchDatasetremote loader that fetches OR-Bench from HuggingFace and converts rows intoSeedPrompts. - Registers the new loader for automatic discovery and documents the new dataset name in the datasets loading notebook output.
- Adds unit tests covering default loading and the toxic config path.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
pyrit/datasets/seed_datasets/remote/or_bench_dataset.py |
Implements the OR-Bench HuggingFace-backed dataset loader and maps records into SeedPrompts. |
pyrit/datasets/seed_datasets/remote/__init__.py |
Imports/exports _ORBenchDataset to trigger provider registration and expose it from the remote loaders package. |
tests/unit/datasets/test_or_bench_dataset.py |
Adds unit tests validating prompt mapping and config propagation to the HuggingFace fetch helper. |
doc/code/datasets/1_loading_datasets.ipynb |
Updates the displayed list of available datasets to include or_bench. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add remote dataset loader for OR-Bench (bench-llm/OR-Bench), an over-refusal benchmark that tests whether language models wrongly refuse safe prompts. Supports both or-bench-hard-1k and or-bench-toxic configurations.