Skip to content

Add Misogyny Probe & Detector to LMRC Framework#1565

Open
peter216 wants to merge 2 commits intoNVIDIA:mainfrom
peter216:detector_misogyny-update
Open

Add Misogyny Probe & Detector to LMRC Framework#1565
peter216 wants to merge 2 commits intoNVIDIA:mainfrom
peter216:detector_misogyny-update

Conversation

@peter216
Copy link
Copy Markdown

@peter216 peter216 commented Jan 19, 2026

Overview

Replaces PR #1541

This PR adds support for detecting and testing misogynistic language in LLMs through the Language Model Risk Cards (LMRC) framework.

It uses a Hugging Face model for contextual detection, adds a lightweight keyword indicator detector, and updates the probe/tests to match the behavior.

Update: model replaced following license review

The original contextual detector used MilaNLProc/bert-base-uncased-ear-misogyny, which carries a GPLv3 license incompatible with garak's Apache 2.0 license (flagged by @jmartin-tech; confirmed the authors were also contacted and a compatible license was not achievable).

The detector now uses cardiffnlp/twitter-roberta-base-hate-latest (CC-BY-4.0), maintained by Cardiff NLP and trained on 13 hate-speech datasets including the hatEval "hate against women" corpus and the "Call me sexist, but" dataset.

Known limitation: the cardiffnlp model detects overt hate speech reliably but does not detect subtle misogyny expressed as implicit bias or personal opinion (e.g. "she's too emotional to be a leader" scores NOT-HATE). The original MilaNLProc model had better coverage of those cases. The two alternative models evaluated — annahaz/xlm-roberta-base-misogyny-sexism-indomain-mix-bal (MIT) and NLP-LTU/bertweet-large-sexism-detector (no license file) — both missed the same class of subtle framing. This tradeoff was accepted: cardiffnlp's license is compatible, its download volume (~37k/month) signals broad community validation, and the coverage gap is documented here and in the detector docstring.

One integration test sentence was updated accordingly: "She's too emotional to be a leader.""You're acting like a girl." The original sentence is preserved in a comment with explanation.

Changes

Files Added

  • tests/detectors/test_detectors_lmrc.py — Comprehensive detector unit tests
  • tests/probes/test_probes_lmrc.py — Comprehensive probe unit tests

Files Modified

  • garak/probes/lmrc.py — Added Misogyny probe class
  • garak/detectors/lmrc.py — Added Misogyny and MisogynyKeywords detector classes
  • resources/plugin_cache.json — Added probes.lmrc.Misogyny

Detector + Probe

  • garak/detectors/lmrc.py
    • Misogyny uses HFDetector with cardiffnlp/twitter-roberta-base-hate-latest (CC-BY-4.0), detector_target_class: "HATE"
    • Docstring documents the subtle-misogyny coverage gap relative to the original model
    • MisogynyKeywords retained as a lightweight keyword indicator detector
  • garak/probes/lmrc.py
    • Updated prompts for the Misogyny probe
    • Added extended_detectors = ["lmrc.MisogynyKeywords"]

Tests

  • Test Documentation
  • tests/detectors/test_detectors_lmrc.py — targets MisogynyKeywords as the unit-testable indicator detector
  • tests/langservice/detectors/test_detectors_misogyny.py — HF integration tests gated by storage requirements; test sentence updated with coverage-gap rationale in comments
  • tests/probes/test_probes_lmrc.py — asserts MisogynyKeywords is included in extended_detectors

Run tests:

python -m pytest tests/detectors/test_detectors_lmrc.py tests/probes/test_probes_lmrc.py -v
python -m pytest tests/langservice/detectors/test_detectors_misogyny.py -v  # requires ~500 MB storage

Compatibility

  • No breaking changes to the probe name (lmrc.Misogyny)
  • Model footprint reduced: cardiffnlp (~500 MB) vs. original (~2 GB)

References

@peter216
Copy link
Copy Markdown
Author

I have read the DCO Document and I hereby sign the DCO

@peter216
Copy link
Copy Markdown
Author

@leondz I've been busy sorry for the late post. Any interest or feedback on this one? Thanks!

Comment thread garak/detectors/lmrc.py
"""Contextual misogyny detection via MilaNLProc/bert-base-uncased-ear-misogyny"""

DEFAULT_PARAMS = HFDetector.DEFAULT_PARAMS | {
"detector_model_path": "MilaNLProc/bert-base-uncased-ear-misogyny",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The license for this model (listed as GPL v3) is not compatible with this project.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. On it. Thanks.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We sent a request for license adjustment to the model authors as well to see if this can be addressed from that side as well.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I'll holdoff until we see if they respond.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the model license does not look like it can be made compatible easily, we might need another source.

Signed-off-by: peter216 <peter216@users.noreply.github.com>
@peter216
Copy link
Copy Markdown
Author

Ok, on it. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants