FEAT Add HarmfulQA dataset loader by romanlutz · Pull Request #1421 · Azure/PyRIT

romanlutz · 2026-03-01T14:14:38Z

Add remote dataset loader for HarmfulQA (declare-lab/HarmfulQA), containing ~2k harmful questions organized by academic topic and subtopic for testing LLM susceptibility to harm-inducing question-answering.

Copilot

Pull request overview

Adds a new remote seed-dataset provider for the HuggingFace declare-lab/HarmfulQA dataset so it can be fetched and used as SeedPrompt entries within PyRIT’s dataset discovery/registration system.

Changes:

Introduced _HarmfulQADataset remote loader that fetches HarmfulQA from HuggingFace and converts rows into SeedPrompts.
Exported the new loader from pyrit.datasets.seed_datasets.remote to trigger auto-registration.
Added unit tests validating basic fetch + conversion behavior and dataset_name.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
pyrit/datasets/seed_datasets/remote/harmful_qa_dataset.py	New remote dataset loader implementation for HarmfulQA -> `SeedDataset`/`SeedPrompt` conversion.
pyrit/datasets/seed_datasets/remote/init.py	Re-export/import the new loader so it’s discoverable/registered alongside other remote loaders.
tests/unit/datasets/test_harmful_qa_dataset.py	Unit tests for fetching/conversion and `dataset_name` behavior.

Copilot · 2026-03-01T14:17:27Z

pyrit/datasets/seed_datasets/remote/harmful_qa_dataset.py

+                dataset_name=self.dataset_name,
+                harm_categories=[item["topic"]],
+                description=description,
+                source="https://huggingface.co/datasets/declare-lab/HarmfulQA",
+                authors=authors,


The source field is hard-coded to the declare-lab/HarmfulQA URL, but this loader exposes dataset_name as a constructor parameter (self.hf_dataset_name). If callers override dataset_name, the returned SeedPrompts will still claim the default source, which is misleading. Consider deriving source (and the HF URL) from self.hf_dataset_name, or accept an explicit source_url parameter.

Copilot · 2026-03-01T14:17:27Z

pyrit/datasets/seed_datasets/remote/harmful_qa_dataset.py

+                source="https://huggingface.co/datasets/declare-lab/HarmfulQA",
+                authors=authors,
+                groups=["DeCLaRe Lab, Singapore University of Technology and Design"],
+                metadata={"subtopic": item.get("subtopic", "")},


groups=[item.get('subtopic', '')] will produce [''] when subtopic is missing/empty, which creates an empty-named group downstream. Use a conditional (e.g., [] when falsy) so groups is either a meaningful label or omitted.

Suggested change

metadata={"subtopic": item.get("subtopic", "")},

metadata=(

{"subtopic": item["subtopic"]}

if item.get("subtopic")

else {}

),

Add remote dataset loader for HarmfulQA (declare-lab/HarmfulQA), containing ~2k harmful questions organized by academic topic and subtopic for testing LLM susceptibility to harm-inducing question-answering. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 1, 2026 14:14

Copilot started reviewing on behalf of romanlutz March 1, 2026 14:15 View session

romanlutz force-pushed the romanlutz/add-harmful-qa-dataset branch from f8de803 to e996238 Compare March 1, 2026 14:16

Copilot AI reviewed Mar 1, 2026

View reviewed changes

romanlutz force-pushed the romanlutz/add-harmful-qa-dataset branch from e996238 to d441180 Compare March 1, 2026 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT Add HarmfulQA dataset loader#1421

FEAT Add HarmfulQA dataset loader#1421
romanlutz wants to merge 1 commit intoAzure:mainfrom
romanlutz:romanlutz/add-harmful-qa-dataset

romanlutz commented Mar 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 1, 2026

Uh oh!

Copilot AI Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-                metadata={"subtopic": item.get("subtopic", "")},
+                metadata=(
+                    {"subtopic": item["subtopic"]}
+                    if item.get("subtopic")
+                    else {}
+                ),

Conversation

romanlutz commented Mar 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants