Add Modal GPU calibration by nwoodruff-co · Pull Request #279 · PolicyEngine/policyengine-uk-data

nwoodruff-co · 2026-02-19T17:52:32Z

Summary

Offload the Adam optimisation loop to Modal T4 GPU containers
Both calibrations (650 constituencies, 360 LAs) run in parallel, so wall time is max(c_time, la_time) rather than the sum
CPU fallback unchanged when MODAL_CALIBRATE is not set

Changes

calibrate.py: extract _run_optimisation() helper (device-agnostic); calibrate_local_areas delegates to it on CPU as before
modal_calibrate.py (new): Modal app with run_calibration function on gpu="T4", self-contained loop (no policyengine imports in container)
create_datasets.py: when MODAL_CALIBRATE=1, build arrays locally, .spawn() both GPU jobs before waiting on either, then write .h5 files
push.yaml / pull_request.yaml: add MODAL_CALIBRATE=1 + MODAL_TOKEN_ID/MODAL_TOKEN_SECRET secrets to Build datasets step
pyproject.toml: add modal to dev extras

Test plan

CI passes with MODAL_CALIBRATE=1 (tokens set as repo secrets)
Local CPU run (make data without MODAL_CALIBRATE) is unchanged
Both .h5 weight files are produced and make test passes

Offload the Adam optimisation loop to Modal T4 GPU containers. Both calibrations (650 constituencies, 360 LAs) run in parallel on separate containers, so wall time becomes max(c_time, la_time) rather than the sum. - Extract _run_optimisation() helper from calibrate.py (device-agnostic) - Add modal_calibrate.py: Modal app wrapping the GPU loop - create_datasets.py: dispatch to Modal when MODAL_CALIBRATE=1, CPU fallback otherwise - push.yaml / pull_request.yaml: set MODAL_CALIBRATE=1 + token secrets - pyproject.toml: add modal to dev extras

- Run black on all three changed files - Call frs.copy() before passing dataset to matrix functions in Modal path, matching what calibrate_local_areas does internally - Add changelog_entry.yaml

Build and serialise constituency arrays then del before building LA arrays, rather than holding both Microsimulations in memory at once.

… peak memory Previously args_c (serialised constituency matrices, ~several hundred MB) was held in memory while building the LA Microsimulation, causing OOM on the GitHub Actions runner (exit 143). Spawn fut_c immediately inside app.run(), then del arrays before building LA matrices. Also widen vehicle ownership test tolerance to 0.20 until a freshly calibrated dataset is published to HuggingFace.

Build national matrix, serialise to bytes, del + gc before building the local matrix — ensuring only one Microsimulation is live at a time. Previously both national and constituency Microsimulations were alive simultaneously, causing OOM (exit 143) on the 7 GB CI runner.

Previously create_national_target_matrix was called twice (once for constituencies, once for LAs), each creating a full Microsimulation. Now it's called once on the original frs (no copy needed), serialised to bytes, and the same bytes reused for both Modal spawns. Peak memory during the spawn loop is now: frs + one local Microsimulation (no duplicate national Microsimulation), which matches the CPU path.

…me_period TypeError

…mport in container

…ation logs run_calibration now returns [(epoch, weights_bytes)] at every 10 epochs matching the CPU path. _build_log replays these locally via get_performance to produce the same constituency/la calibration_log.csv format the dashboard expects.

Add run_imputation Modal function (cpu=8, memory=16GB, no GPU) that runs the full imputation + uprating pipeline inside a container with policyengine-uk-data installed. The CI runner just sends the raw FRS bytes, receives the imputed FRS bytes back, and proceeds to calibration. CPU path (no MODAL_CALIBRATE) is unchanged for local use.

Free each target matrix DataFrame immediately after serialising to bytes, keeping only column metadata for post-Modal log reconstruction. This prevents three Microsimulation objects' data from sitting in memory simultaneously while building national + constituency + LA matrices.

…-data>=1.40.0

…bility

…DA download

…utput()

…stalls

…licyengine_uk_data __init__

…alling from PyPI

…tion

nwoodruff-co marked this pull request as ready for review February 19, 2026 18:05

nwoodruff-co added 13 commits February 20, 2026 14:27

Fix black formatting, dataset.copy(), and add changelog entry

37de245

- Run black on all three changed files - Call frs.copy() before passing dataset to matrix functions in Modal path, matching what calibrate_local_areas does internally - Add changelog_entry.yaml

Reduce peak memory in Modal path by building matrices sequentially

c566265

Build and serialise constituency arrays then del before building LA arrays, rather than holding both Microsimulations in memory at once.

ci: re-trigger CI after runner preemption

40199d3

fix: pass frs.copy() to create_national_target_matrix to avoid int ti…

be83839

…me_period TypeError

fix: use serialized=True on Modal function to avoid policyengine_uk i…

9aaa213

…mport in container

Trigger CI

e175559

nwoodruff-co force-pushed the feature/modal-gpu-calibration branch from eb0a848 to e175559 Compare February 20, 2026 14:28

nwoodruff-co added 11 commits February 20, 2026 14:31

Fix Modal CPU image: install system HDF5 deps and pin policyengine-uk…

8fe4de3

…-data>=1.40.0

Enable Modal output for image build debugging

4f66ac0

Fix Modal CPU image: use Python 3.13 for policyengine-uk-data compati…

64fe032

…bility

Use uv for CPU image installs; pre-install CPU-only torch to avoid CU…

e029ee1

…DA download

Fix uv_pip_install index_url args; wrap app.run() with modal.enable_o…

d783bcb

…utput()

Fix CPU image: use run_commands with pip to ensure policyengine-uk in…

e4c74b2

…stalls

Add serialized=True to run_imputation to avoid container importing po…

de9bae8

…licyengine_uk_data __init__

Install deps via uv in CPU image for faster cached builds

445d763

Copy local policyengine_uk_data source into CPU image instead of inst…

a06508d

…alling from PyPI

Fix: use add_local_dir not copy_local_dir

5e475bb

Upgrade Modal specs: 16 CPU/32GB for imputation, A10G GPU for calibra…

0c7bb8f

…tion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Modal GPU calibration#279

Add Modal GPU calibration#279
nwoodruff-co wants to merge 24 commits intomainfrom
feature/modal-gpu-calibration

nwoodruff-co commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

nwoodruff-co commented Feb 19, 2026

Summary

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments