Add calibration package checkpointing, target config, and hyperparameter CLI by baogorek · Pull Request #538 · PolicyEngine/policyengine-us-data

baogorek · 2026-02-17T17:19:48Z

Fixes #533
Fixes #534

Summary

Calibration package checkpointing: --build-only saves the expensive matrix build as a pickle, --package-path loads it for fast re-fitting with different hyperparameters or target sets
Target config YAML: Declarative exclusion rules (target_config.yaml) replace hardcoded target filtering; checked-in config reproduces the junkyard's 22 excluded groups
Hyperparameter CLI flags: --beta, --lambda-l2, --learning-rate are now tunable from the command line and Modal runner
Modal runner improvements: Streaming subprocess output, support for new flags
Documentation: docs/calibration.md covers all workflows (single-pass, build-then-fit, package re-filtering, Modal, portable fitting)

Note: This branch includes commits from #537 (PUF impute) since the calibration pipeline depends on that work. The calibration-specific changes are in the top commit.

Test plan

pytest policyengine_us_data/tests/test_calibration/test_unified_calibration.py — CLI arg parsing tests
pytest policyengine_us_data/tests/test_calibration/test_target_config.py — target config filtering + package round-trip tests
Manual: make calibrate-build produces package, --package-path loads it and fits

🤖 Generated with Claude Code

juaristi22 · 2026-02-18T10:15:57Z

policyengine_us_data/calibration/unified_calibration.py

+        raw_data = source_sim.dataset.load_dataset()
+        data_dict = {}
+        for var in raw_data:
+            data_dict[var] = {2024: raw_data[var][...]}


this fails when trying to run calibration because load_dataset() returns dicts, not h5py datasets

Suggested change

data_dict[var] = {2024: raw_data[var][...]}

if isinstance(raw_data[var], dict):

vals = list(raw_data[var].values())

data_dict[var] = {2024: vals[0]}

else:

data_dict[var] = {2024: np.array(raw_data[var])}

juaristi22 · 2026-02-18T10:53:12Z

policyengine_us_data/calibration/unified_calibration.py

should we make this and other instances where time_period=2024 is hardcoded flexibly derive the time period from the dataset?

juaristi22 · 2026-02-18T11:01:27Z

policyengine_us_data/calibration/source_impute.py

claude recommends adding new files like source_impute.py and puf_impute.py to the __innit__ file, probably wouldn't hurt though not urgent

juaristi22 · 2026-02-18T11:09:37Z

docs/calibration.md

+- `storage/calibration/unified_diagnostics.csv` --- per-target error report
+- `storage/calibration/unified_run_config.json` --- full run configuration
+
+### 2. Build-then-fit (recommended for iteration)


would we want to support this option for the modal runner as well? i think currently the modal runner is not wired to do so and save the calibration package, so it could only be used for local / kaggle notebook buiilds

juaristi22 · 2026-02-18T11:41:35Z

policyengine_us_data/calibration/source_impute.py

        Person-level state FIPS array.
    """
    hh_ids_person = data.get("person_household_id", {}).get(time_period)
    if hh_ids_person is not None:


will person_household_id ever not be available?
the fallback assumes every household has the same number of people and could lead to wrong state assignments, but we might be able to get rid of it altogether, if we can safely assume that person_household_id will always be in the data

juaristi22

Minor comments, but generally LGTM, I was also able to run the calibration job in modal (after removing the ellipsis in unified_calibration.py)!

Small note: if im not mistaken this pr addressess issue #534. Seems like #310 was referenced in it as something that would be addressed together, but this pr does not save the calibration_log.csv among its outputs. Do we want to add it at this point?

…ter CLI - Add build-only mode to save calibration matrix as pickle package - Add target config YAML for declarative target exclusion rules - Add CLI flags for beta, lambda_l2, learning_rate hyperparameters - Add streaming subprocess output in Modal runner - Add calibration pipeline documentation - Add tests for target config filtering and CLI arg parsing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The Modal calibration runner was missing --lambda-l0 passthrough. Also fix KeyError: Ellipsis when load_dataset() returns dicts instead of h5py datasets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Upload a pre-built calibration package to Modal and run only the fitting phase, skipping HuggingFace download and matrix build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Chunked training with per-target CSV log matching notebook format - Wire --log-freq through CLI and Modal runner - Create output directory if missing (fixes Modal container error) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Set verbose_freq=chunk so epoch counts don't reset each chunk - Rename: diagnostics -> unified_diagnostics.csv, epoch log -> calibration_log.csv (matches dashboard expectation) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Instead of creating a new Microsimulation per clone (~3 min each, 22 hours for 436 clones), precompute values for all 51 states on one sim object (~3 min total), then assemble per-clone values via numpy fancy indexing (~microseconds per clone). New methods: _build_state_values, _assemble_clone_values, _evaluate_constraints_from_values, _calculate_target_values_from_values. DEFAULT_N_CLONES raised to 436 for 5.2M record matrix builds. Takeup re-randomization deferred to future post-processing layer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Modal runner: add --package-volume flag to read calibration package from a Modal Volume instead of passing 2+ GB as a function argument - unified_calibration: set PYTORCH_CUDA_ALLOC_CONF=expandable_segments to prevent CUDA memory fragmentation during L0 backward pass - docs/calibration.md: rewrite to lead with lightweight build-then-fit workflow, document prerequisites, and add volume-based Modal usage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- target_config.yaml: exclude everything except person_count/age (~8,766 targets) to isolate fitting issues from zero-target and zero-row-sum problems in policy variables - target_config_full.yaml: backup of the previous full config - unified_calibration.py: set PYTORCH_CUDA_ALLOC_CONF=expandable_segments to fix CUDA memory fragmentation during backward pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- apply_target_config: support 'include' rules (keep only matching targets) in addition to 'exclude' rules; geo_level now optional - target_config.yaml: 3-line include config replaces 90-line exclusion list for age demographics (person_count with age domain, ~8,784 targets) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The roth_ira_contributions target has zero row sum (no CPS records), making it impossible to calibrate. Remove it from target_config.yaml so Modal runs don't waste epochs on an unachievable target. Also adds `python -m policyengine_us_data.calibration.validate_package` CLI tool for pre-upload package validation, with automatic validation on --build-only runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Achievability analysis showed 9 district-level IRS dollar variables have per-household values 5-27x too high in the extended CPS, making them irreconcilable with count targets (needed_w ~0.04-0.2 vs ~26). Drop salt, AGI, income_tax, dividend/interest vars, QBI deduction, taxable IRA distributions, income_tax_positive, traditional IRA. Add ACA PTC district targets (aca_ptc + tax_unit_count). Save calibration package BEFORE target_config filtering so the full matrix can be reused with different configs without rebuilding. Also: population-based initial weights from age targets per CD, cumulative epoch numbering in chunked logging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

PUF cloning already happens upstream in extended_cps.py, so the --puf-dataset flag in the calibration pipeline was redundant (and would have doubled the data a second time). Removed the flag, _build_puf_cloned_dataset function, and all related params. Added 4 compatible national targets: child_support_expense, child_support_received, health_insurance_premiums_without_medicare_part_b, and rent (all needed_w 27-37, compatible with count targets at ~26). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

juaristi22 reviewed Feb 18, 2026

View reviewed changes

juaristi22 force-pushed the calibration-pipeline-improvements branch from 4c51b32 to 61523d8 Compare February 18, 2026 14:46

juaristi22 mentioned this pull request Feb 18, 2026

Category takeup rerandomization #540

Open

5 tasks

baogorek force-pushed the calibration-pipeline-improvements branch from 61523d8 to 6744481 Compare February 18, 2026 16:47

baogorek and others added 10 commits February 19, 2026 14:33

Ignore all calibration run outputs in storage/calibration/

f42e6aa

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add --lambda-l0 to Modal runner, fix load_dataset dict handling

29e53f9

The Modal calibration runner was missing --lambda-l0 passthrough. Also fix KeyError: Ellipsis when load_dataset() returns dicts instead of h5py datasets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add --package-path support to Modal runner

a898ebc

Upload a pre-built calibration package to Modal and run only the fitting phase, skipping HuggingFace download and matrix build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Create log directory before writing calibration log

fa7ebed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add debug logging for CLI args and command in package path

13ec69c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

baogorek force-pushed the calibration-pipeline-improvements branch from 59b27a8 to 0a0f167 Compare February 19, 2026 23:07

baogorek and others added 9 commits February 19, 2026 18:26

Switch target config to finest-grain include (~18K targets)

32c851b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix at-large district geoid mismatch (7 districts had 0 estimates)

5a04c9f

Add population-based initial weights for L0 calibration

5cb6d86

fixing the stacked dataset builder

40ba0f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add calibration package checkpointing, target config, and hyperparameter CLI#538

Add calibration package checkpointing, target config, and hyperparameter CLI#538
baogorek wants to merge 19 commits intomainfrom
calibration-pipeline-improvements

baogorek commented Feb 17, 2026 •

edited by juaristi22

Loading

Uh oh!

juaristi22 Feb 18, 2026

Uh oh!

juaristi22 Feb 18, 2026

Uh oh!

juaristi22 Feb 18, 2026 •

edited

Loading

Uh oh!

juaristi22 Feb 18, 2026

Uh oh!

juaristi22 Feb 18, 2026

Uh oh!

juaristi22 left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-            data_dict[var] = {2024: raw_data[var][...]}
+            if isinstance(raw_data[var], dict):
+                vals = list(raw_data[var].values())
+                data_dict[var] = {2024: vals[0]}
+            else:
+                data_dict[var] = {2024: np.array(raw_data[var])}

Comments

Conversation

baogorek commented Feb 17, 2026 • edited by juaristi22 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

juaristi22 Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

juaristi22 Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

juaristi22 Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juaristi22 Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

juaristi22 Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

juaristi22 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

baogorek commented Feb 17, 2026 •

edited by juaristi22

Loading

juaristi22 Feb 18, 2026 •

edited

Loading

juaristi22 left a comment •

edited

Loading