Skip to content

Add Bootstrap Confidence Intervals for Attack Success Rates#1577

Merged
jmartin-tech merged 45 commits intoNVIDIA:mainfrom
patriciapampanelli:feature/confidence-intervals
Mar 23, 2026
Merged

Add Bootstrap Confidence Intervals for Attack Success Rates#1577
jmartin-tech merged 45 commits intoNVIDIA:mainfrom
patriciapampanelli:feature/confidence-intervals

Conversation

@patriciapampanelli
Copy link
Copy Markdown
Collaborator

Summary

Adds 95% bootstrap CIs to attack success rates, accounting for sampling variance and detector imperfection via Rogan-Gladen correction.

Changes

  • New: bootstrap_ci.py, detector_metrics.py - CI calculation with Se/Sp correction
  • Modified: evaluators/base.py - CI integration into eval pipeline and output
  • Modified: report_digest.py - CI propagation through reports

Methodology

  1. Resampling: Draws 10,000 bootstrap samples from the binary pass/fail results (with replacement)
  2. Correction: Adjusts each sample's observed rate using the Rogan-Gladen formula to account for detector error
  3. Interval extraction: Takes the 2.5th and 97.5th percentiles as CI bounds

The correction formula:

P_true = (P_obs + Sp - 1) / (Se + Sp - 1)
  • P_obs = observed failure rate in the resampled data
  • Se = detector sensitivity (probability of detecting a true attack)
  • Sp = detector specificity (probability of correctly passing a benign response)

Requires ≥30 evaluated outputs per probe-detector pair; falls back to perfect detector (Se=Sp=1.0) when detector metrics unavailable.

Statistical Limitations

  • Se/Sp treated as fixed (no detectors uncertainty propagation)
  • Uses detector-level metrics only (not probe-specific): Detector performance (Se/Sp) can vary depending on the probe.

Out of Scope

  • Probe-specific Se/Sp lookup

Copy link
Copy Markdown
Collaborator

@erickgalinkin erickgalinkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to find a better way to print this. I'm mostly confident that this methodology can work, though I had trouble writing a formal proof that this gives us a true 95% CI.

Comment thread docs/source/reporting.rst Outdated
Comment on lines +42 to +43
During console output, attack success rates may include confidence intervals displayed as: ``(attack success rate: 45.23%) ± 2.15``.
The ± margin represents the 95% confidence interval half-width in percentage points.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realistically, our + and - won't be evenly distributed. We almost universally have asymmetric CIs.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, yes, they are already calculated asymmetrically. I'll correct how the CI's are displayed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Updated to bracketed format [lower%, upper%].

Comment thread garak/analyze/bootstrap_ci.py Outdated
Comment thread garak/analyze/bootstrap_ci.py Outdated
p_obs = resampled_results.mean()

# Apply Se/Sp correction to get true ASR
# TODO: propagate detector metric uncertainty (requires Se/Sp CIs in detector_metrics_summary.json)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3

Comment thread garak/analyze/bootstrap_ci.py
Comment thread garak/evaluators/base.py Outdated
Comment on lines +254 to +258
ci_text = (
f" ± {(ci_upper - ci_lower) / 2:.2f}"
if ci_lower is not None and ci_upper is not None
else ""
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this assume even distribution? I understand there's some lossiness in printing it this way, but I'd think that if failrate is, for example, 100%, we'd want something more like:
ci_lower <= failrate? Hard to manage it, but I'm not completely sure how to avoid saying something like "100% ± 10%"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would love to do this based on model of distribution of probe:detector scores acquired during calibration, thus ditching the frequently-untrue even assumption

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leondz I have a separate research branch where I try a totally different calculation. Working on checking how different my bounds (which are derived from a nonparametric test on an empirical CDF) are compared to these.

Comment thread garak/evaluators/base.py Outdated
Copy link
Copy Markdown
Collaborator

@leondz leondz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shaping up well. Few minor requests around non-duplication and configuration. Larger questions about where this code belongs and how to support CI calculation beyond Evaluator.

Comment thread docs/source/reporting.rst
Comment thread docs/source/reporting.rst Outdated
Comment thread garak/analyze/bootstrap_ci.py Outdated
Comment on lines +16 to +17
num_iterations: int = 10000,
confidence_level: float = 0.95,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be configurable, propose in core config under reporting

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Now reads from _config.reporting.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

The intent with _config for objects to never read from it, but instead from a config parameter passed at instantiation. I think adherence to this pattern might block directly accessing _config in these methods, and then the question is where does the data come from. One solution might be to have the instantiated Evaluator - which is configured with access to those parameter - pass these values to this function; or even to pass this function its own config object. Could that make sense?

also paging @jmartin-tech for opinion

Comment thread garak/analyze/bootstrap_ci.py Outdated
Comment thread garak/analyze/bootstrap_ci.py Outdated
Comment thread garak/evaluators/base.py Outdated
Comment thread garak/evaluators/base.py Outdated
Comment thread garak/evaluators/base.py Outdated
Comment on lines +254 to +258
ci_text = (
f" ± {(ci_upper - ci_lower) / 2:.2f}"
if ci_lower is not None and ci_upper is not None
else ""
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would love to do this based on model of distribution of probe:detector scores acquired during calibration, thus ditching the frequently-untrue even assumption

Comment thread garak/evaluators/base.py
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like a super-simple CI for the general case that ignores detector performance, clamped to 0.0-1.0. We can estimate a CI for cases where we don't have extensive detector perf information, and we can do it quickly.

Could be configured in core via e.g. reporting.confidence_interval_method with values:

  • None - no confidence interval calc/display
  • bootstrap - bootstrap only
  • simple - simple only
  • backoff - bootstrap where we can, simple in the gaps

backoff might be a bit much for this week, but some pattern like this is where I'd like this to go

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me how that's a real CI and I wonder how we'd manage it, exactly? I worry that a simple CI without a basis isn't truly a CI.

We also have nonparametric CI ready as a fast-follow for probe/detector pairs where we have calibration data.

@leondz leondz added the reporting Reporting, analysis, and other per-run result functions label Jan 28, 2026
@leondz leondz self-assigned this Feb 3, 2026
Copy link
Copy Markdown
Collaborator

@leondz leondz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there - a few more comments and tweaks

Comment thread docs/source/configurable.rst Outdated
Comment thread garak/resources/garak.core.yaml Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the jinja has gone in main now!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @leondz! Should I:

  1. Update this PR to work with the new React format, or
  2. Split display changes into a separate PR after this merges?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as this PR places all the required information in the jsonl, It is reasonable to make exposing the data in a report a separate PR.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note if the changes were to be incompatible with existing reporting, such as by removing or renaming a digest entry that would be the line to require update in the same PR.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this PR to work with the new React format, or
Split display changes into a separate PR after this merges?

split display changes into a separate PR, definitely

Comment thread garak/analyze/ci_calculator.py Outdated
"No eval_threshold found in setup entry for %s, using default 0.5",
report_file
)
eval_threshold = 0.5
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would expect default to be taken from run.eval_threshold config

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Changed ci_calculator.py to use run.eval_threshold from config instead of hardcoded 0.5.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has this value now gone from analyze.ci_calculator?

Comment thread garak/analyze/ci_calculator.py Outdated
Comment thread garak/evaluators/base.py Outdated
if _config.system.show_z:
self.calibration = garak.analyze.calibration.Calibration()

ci_method = getattr(_config.reporting, 'confidence_interval_method', "None")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer NoneType here

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread garak/evaluators/base.py
Comment thread garak/evaluators/base.py Outdated
Comment thread garak/evaluators/base.py
Comment thread garak/cli.py Outdated
patriciapampanelli and others added 19 commits February 6, 2026 10:13
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
…ic ± format

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Co-authored-by: Erick Galinkin <erick.galinkin@gmail.com>
Signed-off-by: Patricia Pampanelli <38949950+patriciapampanelli@users.noreply.github.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Co-authored-by: Leon Derczynski <leonderczynski@gmail.com>
Signed-off-by: Patricia Pampanelli <38949950+patriciapampanelli@users.noreply.github.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
… config with None/bootstrap options

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Co-authored-by: Leon Derczynski <leonderczynski@gmail.com>
Signed-off-by: Patricia Pampanelli <38949950+patriciapampanelli@users.noreply.github.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
@patriciapampanelli
Copy link
Copy Markdown
Collaborator Author

@leondz Re the eval_threshold question: Yes. That value is no longer in analyze.ci_calculator. The module doesn’t read or default eval_threshold. The calculator just reads the resulting aggregates from the digest.

Copy link
Copy Markdown
Collaborator

@erickgalinkin erickgalinkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good first pass, I'm into it!

Comment thread docs/source/reporting.rst
np.random.seed(_config.run.seed)

denominator = sensitivity + specificity - 1.0
if abs(denominator) < 0.01:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why abs? Are we ok with a negative denominator?

Copy link
Copy Markdown
Collaborator

@erickgalinkin erickgalinkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this in its current form. We can get some additional methods in with not too much work, I think.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me how that's a real CI and I wonder how we'd manage it, exactly? I worry that a simple CI without a basis isn't truly a CI.

We also have nonparametric CI ready as a fast-follow for probe/detector pairs where we have calibration data.

Copy link
Copy Markdown
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing leads me to some basic usage expectations questions.

By having no default setting the reports maintain backwards compatibility, however is this the desired way for this to land? I am open to this but am interested in a clear reasoning for that expectation.

I also note argument parsing expectations in documentation that I don't see reflected in the code change. The suggested options do not refer to exposing the core confidence_interval_method which I would expect to be the more common item to pass.

  • Is the expectation that a --rebuild_cis would only rebuild the same metric type?
  • Should confidence_interval_method expect to be overridden if --config was passed containing a reporting section that differs from the original report file when --rebuild_cis was passed?
  • Should the bootstrap_confidence_level, bootstrap_num_iterations, and bootstrap_min_sample_size options be offered via cli at all or should this base on requiring a file passed via a --config combination on the cli to get something other than the embedded values in the report?
  • If the existing report did not have any options set what is --rebuild_cis expected to just report nothing was calculated or should it report no confidence interval configuration found?

Comment thread garak/evaluators/base.py
Comment thread docs/source/configurable.rst
Comment thread docs/source/reporting.rst
Comment on lines +45 to +48
.. code-block:: bash

garak --rebuild_cis report.jsonl --bootstrap_num_iterations 50000 --bootstrap_confidence_level 0.99

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd, no cli code change actually provided support for these additional values.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new feature impacts how eval entries are formatted in reports, I don't see that change reflected as updates to static test _assets. I would expect some of the existing files to be updated as well as new tests that validate what happens when you have a mix of formats in reports used to aggregate or digest jsonl files.

patriciapampanelli and others added 10 commits March 4, 2026 14:40
Co-authored-by: Jeffrey Martin <jmartin@Op3n4M3.dev>
Signed-off-by: Patricia Pampanelli <38949950+patriciapampanelli@users.noreply.github.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
…lags

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Jeffrey Martin <jemartin@nvidia.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
@patriciapampanelli patriciapampanelli force-pushed the feature/confidence-intervals branch from 9826004 to e1a5d90 Compare March 7, 2026 00:03
Signed-off-by: Jeffrey Martin <jemartin@nvidia.com>
Copy link
Copy Markdown
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing shows the rebuild_cis option does not work as I expect, running it against a report that did not have ci intervals due to setting the confidence_interval_method: none via config requesting bootstrap based results produces a revised digest that still does not contain "confidence" values. I suspect this is due to the rebuild_cis_for_report not writing the eval entry changes or the new config values to the report before rebuilding the digest.

I see code that expects to rewrite eval lines, and still have not debugged enough to determine why they are not updated, however I do not see anything that would update the "start_run setup" entry with updated config values, though I suspect that should not impact the calculations it may result in inconsistent files that have differing "start_run setup" data in the config first line of the file and in the last line digest[meta].

I am also a bit surprised that the option writes a new html report automatically.

Editing the input file additionally seems problematic. I suspect this option may need a guard like the digest_report helper which requires an explicit flag to overwrite the input file. I am wondering if we should consider shifting the workflow to be more like digest_report as a helper tool vs exposing it as a cli option base on these thoughts.

What does and does not belong as exposed cli capabilities vs analysis tools is still a bit of a grey area.

update_eval_entries_with_ci,
)

existing = _extract_ci_config_from_report(str(report_file))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the rebuild_cis_for_report method should also read up the previous config from the "start_run setup" entry to report what differs from the original config if anything.

If something in config has changed that should also be reflected in the output final report.

Comment thread garak/analyze/rebuild_cis.py Outdated
return 0

print(f"📊 Updating {len(ci_results)} probe/detector pairs with new CIs")
update_eval_entries_with_ci(str(report_file), ci_results)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not look to be actually updating the entries in the file.

I am also not sold this should be overwrite the input file by default.

Comment thread garak/analyze/rebuild_cis.py Outdated
Comment on lines +85 to +88
html_report = build_html(digest, _config)
with open(html_output, "w", encoding="utf-8") as htmlfile:
htmlfile.write(html_report)
print(f"📄 HTML digest written to {html_output}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the html report does not actually change, should we defer this and expect the user to run digest_report on the new jsonl instead of preforming it automatically? Note this could overwrite an existing file that has not been explicitly referenced in the cli command.

I can see users failing to backup the html version of file before calling for rebuild and no expecting to have the html file changed.

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>
Signed-off-by: Jeffrey Martin <jemartin@nvidia.com>
@jmartin-tech jmartin-tech merged commit 9da79fa into NVIDIA:main Mar 23, 2026
15 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Mar 23, 2026
@patriciapampanelli patriciapampanelli deleted the feature/confidence-intervals branch March 24, 2026 18:06
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

reporting Reporting, analysis, and other per-run result functions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants