Add Bootstrap Confidence Intervals for Attack Success Rates by patriciapampanelli · Pull Request #1577 · NVIDIA/garak

patriciapampanelli · 2026-01-27T19:06:22Z

Summary

Adds 95% bootstrap CIs to attack success rates, accounting for sampling variance and detector imperfection via Rogan-Gladen correction.

Changes

New: bootstrap_ci.py, detector_metrics.py - CI calculation with Se/Sp correction
Modified: evaluators/base.py - CI integration into eval pipeline and output
Modified: report_digest.py - CI propagation through reports

Methodology

Resampling: Draws 10,000 bootstrap samples from the binary pass/fail results (with replacement)
Correction: Adjusts each sample's observed rate using the Rogan-Gladen formula to account for detector error
Interval extraction: Takes the 2.5th and 97.5th percentiles as CI bounds

The correction formula:

P_true = (P_obs + Sp - 1) / (Se + Sp - 1)

P_obs = observed failure rate in the resampled data
Se = detector sensitivity (probability of detecting a true attack)
Sp = detector specificity (probability of correctly passing a benign response)

Requires ≥30 evaluated outputs per probe-detector pair; falls back to perfect detector (Se=Sp=1.0) when detector metrics unavailable.

Statistical Limitations

Se/Sp treated as fixed (no detectors uncertainty propagation)
Uses detector-level metrics only (not probe-specific): Detector performance (Se/Sp) can vary depending on the probe.

Out of Scope

Probe-specific Se/Sp lookup

erickgalinkin

Would be nice to find a better way to print this. I'm mostly confident that this methodology can work, though I had trouble writing a formal proof that this gives us a true 95% CI.

erickgalinkin · 2026-01-27T19:33:11Z

+During console output, attack success rates may include confidence intervals displayed as: ``(attack success rate: 45.23%) ± 2.15``.
+The ± margin represents the 95% confidence interval half-width in percentage points.


Realistically, our + and - won't be evenly distributed. We almost universally have asymmetric CIs.

Absolutely, yes, they are already calculated asymmetrically. I'll correct how the CI's are displayed.

Done. Updated to bracketed format [lower%, upper%].

erickgalinkin · 2026-01-27T19:35:02Z

+        p_obs = resampled_results.mean()
+
+        # Apply Se/Sp correction to get true ASR
+        # TODO: propagate detector metric uncertainty (requires Se/Sp CIs in detector_metrics_summary.json)


erickgalinkin · 2026-01-27T19:43:29Z

+            ci_text = (
+                f" ± {(ci_upper - ci_lower) / 2:.2f}"
+                if ci_lower is not None and ci_upper is not None
+                else ""
+            )


Doesn't this assume even distribution? I understand there's some lossiness in printing it this way, but I'd think that if failrate is, for example, 100%, we'd want something more like:
ci_lower <= failrate? Hard to manage it, but I'm not completely sure how to avoid saying something like "100% ± 10%"

would love to do this based on model of distribution of probe:detector scores acquired during calibration, thus ditching the frequently-untrue even assumption

@leondz I have a separate research branch where I try a totally different calculation. Working on checking how different my bounds (which are derived from a nonparametric test on an empirical CDF) are compared to these.

leondz

Shaping up well. Few minor requests around non-duplication and configuration. Larger questions about where this code belongs and how to support CI calculation beyond Evaluator.

leondz · 2026-01-28T13:01:05Z

+    num_iterations: int = 10000,
+    confidence_level: float = 0.95,


these should be configurable, propose in core config under reporting

Fixed. Now reads from _config.reporting.

Thanks!

The intent with _config for objects to never read from it, but instead from a config parameter passed at instantiation. I think adherence to this pattern might block directly accessing _config in these methods, and then the question is where does the data come from. One solution might be to have the instantiated Evaluator - which is configured with access to those parameter - pass these values to this function; or even to pass this function its own config object. Could that make sense?

also paging @jmartin-tech for opinion

leondz · 2026-01-28T13:16:06Z

+            ci_text = (
+                f" ± {(ci_upper - ci_lower) / 2:.2f}"
+                if ci_lower is not None and ci_upper is not None
+                else ""
+            )


would love to do this based on model of distribution of probe:detector scores acquired during calibration, thus ditching the frequently-untrue even assumption

leondz · 2026-01-28T13:24:12Z

I'd still like a super-simple CI for the general case that ignores detector performance, clamped to 0.0-1.0. We can estimate a CI for cases where we don't have extensive detector perf information, and we can do it quickly.

Could be configured in core via e.g. reporting.confidence_interval_method with values:

None - no confidence interval calc/display

bootstrap - bootstrap only

simple - simple only

backoff - bootstrap where we can, simple in the gaps

backoff might be a bit much for this week, but some pattern like this is where I'd like this to go

It's not clear to me how that's a real CI and I wonder how we'd manage it, exactly? I worry that a simple CI without a basis isn't truly a CI.

We also have nonparametric CI ready as a fast-follow for probe/detector pairs where we have calibration data.

leondz

Getting there - a few more comments and tweaks

leondz · 2026-02-04T11:18:35Z

all the jinja has gone in main now!

Hi @leondz! Should I:

Update this PR to work with the new React format, or

Split display changes into a separate PR after this merges?

As long as this PR places all the required information in the jsonl, It is reasonable to make exposing the data in a report a separate PR.

Note if the changes were to be incompatible with existing reporting, such as by removing or renaming a digest entry that would be the line to require update in the same PR.

Update this PR to work with the new React format, or
Split display changes into a separate PR after this merges?

split display changes into a separate PR, definitely

leondz · 2026-02-04T11:22:02Z

+            "No eval_threshold found in setup entry for %s, using default 0.5",
+            report_file
+        )
+        eval_threshold = 0.5


would expect default to be taken from run.eval_threshold config

Done. Changed ci_calculator.py to use run.eval_threshold from config instead of hardcoded 0.5.

has this value now gone from analyze.ci_calculator?

leondz · 2026-02-04T11:33:48Z

        if _config.system.show_z:
            self.calibration = garak.analyze.calibration.Calibration()
+
+        ci_method = getattr(_config.reporting, 'confidence_interval_method', "None")


Prefer NoneType here

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

…ic ± format Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Co-authored-by: Erick Galinkin <erick.galinkin@gmail.com> Signed-off-by: Patricia Pampanelli <38949950+patriciapampanelli@users.noreply.github.com>

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Co-authored-by: Leon Derczynski <leonderczynski@gmail.com> Signed-off-by: Patricia Pampanelli <38949950+patriciapampanelli@users.noreply.github.com>

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

… config with None/bootstrap options Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Co-authored-by: Leon Derczynski <leonderczynski@gmail.com> Signed-off-by: Patricia Pampanelli <38949950+patriciapampanelli@users.noreply.github.com>

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

patriciapampanelli · 2026-02-19T18:35:40Z

@leondz Re the eval_threshold question: Yes. That value is no longer in analyze.ci_calculator. The module doesn’t read or default eval_threshold. The calculator just reads the resulting aggregates from the digest.

erickgalinkin

Good first pass, I'm into it!

erickgalinkin · 2026-02-05T19:29:11Z

+        np.random.seed(_config.run.seed)
+
+    denominator = sensitivity + specificity - 1.0
+    if abs(denominator) < 0.01:


Why abs? Are we ok with a negative denominator?

erickgalinkin

I'm ok with this in its current form. We can get some additional methods in with not too much work, I think.

erickgalinkin · 2026-02-23T16:06:26Z

It's not clear to me how that's a real CI and I wonder how we'd manage it, exactly? I worry that a simple CI without a basis isn't truly a CI.

We also have nonparametric CI ready as a fast-follow for probe/detector pairs where we have calibration data.

jmartin-tech

Testing leads me to some basic usage expectations questions.

By having no default setting the reports maintain backwards compatibility, however is this the desired way for this to land? I am open to this but am interested in a clear reasoning for that expectation.

I also note argument parsing expectations in documentation that I don't see reflected in the code change. The suggested options do not refer to exposing the core confidence_interval_method which I would expect to be the more common item to pass.

Is the expectation that a --rebuild_cis would only rebuild the same metric type?
Should confidence_interval_method expect to be overridden if --config was passed containing a reporting section that differs from the original report file when --rebuild_cis was passed?
Should the bootstrap_confidence_level, bootstrap_num_iterations, and bootstrap_min_sample_size options be offered via cli at all or should this base on requiring a file passed via a --config combination on the cli to get something other than the embedded values in the report?
If the existing report did not have any options set what is --rebuild_cis expected to just report nothing was calculated or should it report no confidence interval configuration found?

jmartin-tech · 2026-03-02T21:36:09Z

+.. code-block:: bash
+
+   garak --rebuild_cis report.jsonl --bootstrap_num_iterations 50000 --bootstrap_confidence_level 0.99
+


This is odd, no cli code change actually provided support for these additional values.

jmartin-tech · 2026-03-02T21:41:25Z

This new feature impacts how eval entries are formatted in reports, I don't see that change reflected as updates to static test _assets. I would expect some of the existing files to be updated as well as new tests that validate what happens when you have a mix of formats in reports used to aggregate or digest jsonl files.

Co-authored-by: Jeffrey Martin <jmartin@Op3n4M3.dev> Signed-off-by: Patricia Pampanelli <38949950+patriciapampanelli@users.noreply.github.com>

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

…lags Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Signed-off-by: Jeffrey Martin <jemartin@nvidia.com>

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Signed-off-by: Jeffrey Martin <jemartin@nvidia.com>

jmartin-tech

Testing shows the rebuild_cis option does not work as I expect, running it against a report that did not have ci intervals due to setting the confidence_interval_method: none via config requesting bootstrap based results produces a revised digest that still does not contain "confidence" values. I suspect this is due to the rebuild_cis_for_report not writing the eval entry changes or the new config values to the report before rebuilding the digest.

I see code that expects to rewrite eval lines, and still have not debugged enough to determine why they are not updated, however I do not see anything that would update the "start_run setup" entry with updated config values, though I suspect that should not impact the calculations it may result in inconsistent files that have differing "start_run setup" data in the config first line of the file and in the last line digest[meta].

I am also a bit surprised that the option writes a new html report automatically.

Editing the input file additionally seems problematic. I suspect this option may need a guard like the digest_report helper which requires an explicit flag to overwrite the input file. I am wondering if we should consider shifting the workflow to be more like digest_report as a helper tool vs exposing it as a cli option base on these thoughts.

What does and does not belong as exposed cli capabilities vs analysis tools is still a bit of a grey area.

jmartin-tech · 2026-03-09T16:39:23Z

+        update_eval_entries_with_ci,
+    )
+
+    existing = _extract_ci_config_from_report(str(report_file))


Consider the rebuild_cis_for_report method should also read up the previous config from the "start_run setup" entry to report what differs from the original config if anything.

If something in config has changed that should also be reflected in the output final report.

jmartin-tech · 2026-03-09T16:42:06Z

+            return 0
+
+        print(f"📊 Updating {len(ci_results)} probe/detector pairs with new CIs")
+        update_eval_entries_with_ci(str(report_file), ci_results)


This does not look to be actually updating the entries in the file.

I am also not sold this should be overwrite the input file by default.

jmartin-tech · 2026-03-09T16:45:30Z

+        html_report = build_html(digest, _config)
+        with open(html_output, "w", encoding="utf-8") as htmlfile:
+            htmlfile.write(html_report)
+        print(f"📄 HTML digest written to {html_output}")


Currently the html report does not actually change, should we defer this and expect the user to run digest_report on the new jsonl instead of preforming it automatically? Note this could overwrite an existing file that has not been explicitly referenced in the cli command.

I can see users failing to backup the html version of file before calling for rebuild and no expecting to have the html file changed.

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Signed-off-by: Jeffrey Martin <jemartin@nvidia.com>

patriciapampanelli requested review from erickgalinkin, jmartin-tech and leondz January 27, 2026 19:06

patriciapampanelli self-assigned this Jan 27, 2026

erickgalinkin reviewed Jan 27, 2026

View reviewed changes

leondz requested changes Jan 28, 2026

View reviewed changes

leondz added the reporting Reporting, analysis, and other per-run result functions label Jan 28, 2026

leondz self-assigned this Feb 3, 2026

leondz reviewed Feb 4, 2026

View reviewed changes

patriciapampanelli requested review from erickgalinkin and leondz February 5, 2026 17:56

patriciapampanelli and others added 19 commits February 6, 2026 10:13

Implement non-parametric bootstrap CI for ASR with detector correction

f8178ae

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Display CI's as asymmetric bounds [lower%, upper%] instead of symmetr…

c031785

…ic ± format Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Update garak/analyze/bootstrap_ci.py

3ecee6d

Co-authored-by: Erick Galinkin <erick.galinkin@gmail.com> Signed-off-by: Patricia Pampanelli <38949950+patriciapampanelli@users.noreply.github.com>

Optimize calculation by skipping correction for perfect detectors

bc207da

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Update docs/source/reporting.rst

d4d60c0

Co-authored-by: Leon Derczynski <leonderczynski@gmail.com> Signed-off-by: Patricia Pampanelli <38949950+patriciapampanelli@users.noreply.github.com>

Centralize configuration values and improve error handling

4eee6c9

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Refactor detector metrics exception handling

93ce5d5

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Refactor detector metrics and document bootstrap config

ee9fc06

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Make CI calculation optional via reporting.confidence_interval_method…

dcaad1d

… config with None/bootstrap options Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Add --rebuild_cis command and fix CI inversion bug in HTML display

28771ef

Add CI calculator tests

45f9c5d

Hide zero-width bootstrap CIs at boundaries (0% and 100%)

5598163

clarify confidence interval configuration in reporting

a0c7d8b

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

standardize confidence_interval_method config to use empty values

91c8121

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Add type hints for optional CI parameters in evaluators

d4d78e6

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Update garak/cli.py

a71281f

Co-authored-by: Leon Derczynski <leonderczynski@gmail.com> Signed-off-by: Patricia Pampanelli <38949950+patriciapampanelli@users.noreply.github.com>

Simplify bootstrap data collection using post-hoc construction

d65196c

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Use config default for eval_threshold in ci_calculator fallback

f2bdb5d

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Move bootstrap_min_sample_size to config for single source of truth

a19ac42

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

erickgalinkin approved these changes Feb 20, 2026

View reviewed changes

erickgalinkin reviewed Feb 23, 2026

View reviewed changes

jmartin-tech requested changes Mar 2, 2026

View reviewed changes

patriciapampanelli and others added 10 commits March 4, 2026 14:40

Update garak/evaluators/base.py

7247bb0

Co-authored-by: Jeffrey Martin <jmartin@Op3n4M3.dev> Signed-off-by: Patricia Pampanelli <38949950+patriciapampanelli@users.noreply.github.com>

Fix exc_info typo in evaluator logging call

099bedd

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Enable bootstrap confidence intervals by default

f5430d8

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Add --bootstrap_num_iterations and --bootstrap_confidence_level CLI f…

f09d576

…lags Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Add --bootstrap_min_sample_size CLI flag

ef0bb22

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Add --confidence_interval_method CLI flag

7e7bfd8

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Read bootstrap CI options from _config.reporting instead of args

626d163

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Refactor --rebuild_cis into config-aware module

b363f8a

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

update docs cliref for new options

27ad57c

Signed-off-by: Jeffrey Martin <jemartin@nvidia.com>

Add CI test asset and mixed-format aggregate/digest tests

e1a5d90

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

patriciapampanelli force-pushed the feature/confidence-intervals branch from 9826004 to e1a5d90 Compare March 7, 2026 00:03

jmartin-tech added 2 commits March 7, 2026 11:50

Merge 'main' into feature/confidence-intervals

1730d14

update _asset paths

4ad09f9

Signed-off-by: Jeffrey Martin <jemartin@nvidia.com>

jmartin-tech requested changes Mar 9, 2026

View reviewed changes

patriciapampanelli added 6 commits March 9, 2026 12:12

Fix CI documentation: correct default and add CLI flag references

633b78f

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Remove auto HTML generation from rebuild_cis

c51e115

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Fix eval entry key mismatch in rebuild_cis CI pipeline

0a47c66

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Recalculate digest after updating eval entries in rebuild_cis

7e82aeb

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Fix CI rebuild: eval key mismatch and start_run setup config update

316d491

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

Refactor rebuild_cis into standalone analysis tool

d306071

Signed-off-by: Patricia Pampanelli <ppampanelli@nvidia.com>

patriciapampanelli requested a review from jmartin-tech March 10, 2026 19:45

jmartin-tech approved these changes Mar 23, 2026

View reviewed changes

update cliref to reflect rebuild_cis is tool based only

8b10e23

Signed-off-by: Jeffrey Martin <jemartin@nvidia.com>

jmartin-tech merged commit 9da79fa into NVIDIA:main Mar 23, 2026
15 checks passed

github-actions Bot locked and limited conversation to collaborators Mar 23, 2026

patriciapampanelli deleted the feature/confidence-intervals branch March 24, 2026 18:06

		During console output, attack success rates may include confidence intervals displayed as: ``(attack success rate: 45.23%) ± 2.15``.
		The ± margin represents the 95% confidence interval half-width in percentage points.

		.. code-block:: bash

		garak --rebuild_cis report.jsonl --bootstrap_num_iterations 50000 --bootstrap_confidence_level 0.99

Conversation

patriciapampanelli commented Jan 27, 2026

Summary

Changes

Methodology

Statistical Limitations

Out of Scope

Uh oh!

erickgalinkin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

leondz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leondz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jmartin-tech left a comment •

edited

Loading