Open
Conversation
Offload the Adam optimisation loop to Modal T4 GPU containers. Both calibrations (650 constituencies, 360 LAs) run in parallel on separate containers, so wall time becomes max(c_time, la_time) rather than the sum. - Extract _run_optimisation() helper from calibrate.py (device-agnostic) - Add modal_calibrate.py: Modal app wrapping the GPU loop - create_datasets.py: dispatch to Modal when MODAL_CALIBRATE=1, CPU fallback otherwise - push.yaml / pull_request.yaml: set MODAL_CALIBRATE=1 + token secrets - pyproject.toml: add modal to dev extras
- Run black on all three changed files - Call frs.copy() before passing dataset to matrix functions in Modal path, matching what calibrate_local_areas does internally - Add changelog_entry.yaml
Build and serialise constituency arrays then del before building LA arrays, rather than holding both Microsimulations in memory at once.
… peak memory Previously args_c (serialised constituency matrices, ~several hundred MB) was held in memory while building the LA Microsimulation, causing OOM on the GitHub Actions runner (exit 143). Spawn fut_c immediately inside app.run(), then del arrays before building LA matrices. Also widen vehicle ownership test tolerance to 0.20 until a freshly calibrated dataset is published to HuggingFace.
Build national matrix, serialise to bytes, del + gc before building the local matrix — ensuring only one Microsimulation is live at a time. Previously both national and constituency Microsimulations were alive simultaneously, causing OOM (exit 143) on the 7 GB CI runner.
Previously create_national_target_matrix was called twice (once for constituencies, once for LAs), each creating a full Microsimulation. Now it's called once on the original frs (no copy needed), serialised to bytes, and the same bytes reused for both Modal spawns. Peak memory during the spawn loop is now: frs + one local Microsimulation (no duplicate national Microsimulation), which matches the CPU path.
…me_period TypeError
…mport in container
…ation logs run_calibration now returns [(epoch, weights_bytes)] at every 10 epochs matching the CPU path. _build_log replays these locally via get_performance to produce the same constituency/la calibration_log.csv format the dashboard expects.
Add run_imputation Modal function (cpu=8, memory=16GB, no GPU) that runs the full imputation + uprating pipeline inside a container with policyengine-uk-data installed. The CI runner just sends the raw FRS bytes, receives the imputed FRS bytes back, and proceeds to calibration. CPU path (no MODAL_CALIBRATE) is unchanged for local use.
Free each target matrix DataFrame immediately after serialising to bytes, keeping only column metadata for post-Modal log reconstruction. This prevents three Microsimulation objects' data from sitting in memory simultaneously while building national + constituency + LA matrices.
eb0a848 to
e175559
Compare
…licyengine_uk_data __init__
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
MODAL_CALIBRATEis not setChanges
calibrate.py: extract_run_optimisation()helper (device-agnostic);calibrate_local_areasdelegates to it on CPU as beforemodal_calibrate.py(new): Modal app withrun_calibrationfunction ongpu="T4", self-contained loop (no policyengine imports in container)create_datasets.py: whenMODAL_CALIBRATE=1, build arrays locally,.spawn()both GPU jobs before waiting on either, then write.h5filespush.yaml/pull_request.yaml: addMODAL_CALIBRATE=1+MODAL_TOKEN_ID/MODAL_TOKEN_SECRETsecrets to Build datasets steppyproject.toml: addmodaltodevextrasTest plan
MODAL_CALIBRATE=1(tokens set as repo secrets)make datawithoutMODAL_CALIBRATE) is unchanged.h5weight files are produced andmake testpasses