Conversation
- Create `lcm.persistence` module with `save_solution`/`load_solution` (extracted `_atomic_dump` from `simulation/result.py`) - Add `debug_path` and `keep_n_latest` params to `solve()`, `simulate()`, and `solve_and_simulate()` for auto-persisting intermediate results to disk - Rename `debug_mode` to `debug` across all public APIs - Make cloudpickle a hard dependency (remove all try/import guards) - Enhance `validate_value_function_array` with regime name and NaN count in errors - Add debugging user guide with recipes (disable JIT, auto-persist, optimagic NaN debugging, inspecting value functions, understanding error messages) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code reviewFound 2 issues:
Lines 54 to 56 in 32ce4a3
Lines 716 to 718 in 32ce4a3 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
…log a complete reproducer.
timmens
left a comment
There was a problem hiding this comment.
Looks very good. I have a few comments. Some of the issues are only somewhat related to this PR, but the changes here make them easier to spot.
| seed = draw_random_seed() | ||
|
|
||
| logger.info("Starting simulation") | ||
| has_multiple_regimes = len(internal_regimes) > 1 |
There was a problem hiding this comment.
Don't we always have at least two regimes?
|
|
||
| # Check for NaN/Inf in V_arr | ||
| if jnp.any(jnp.isnan(result.V_arr)) or jnp.any(jnp.isinf(result.V_arr)): | ||
| logger.warning( | ||
| "NaN/Inf in V_arr for regime '%s' at age %s", regime_name, age | ||
| ) | ||
|
|
||
| subject_regime_ids = new_subject_regime_ids | ||
|
|
||
| # Log regime transition counts at debug level | ||
| if has_multiple_regimes and logger.isEnabledFor(logging.DEBUG): | ||
| _log_regime_transitions( | ||
| logger=logger, | ||
| prev_regime_ids=prev_regime_ids, | ||
| new_regime_ids=subject_regime_ids, | ||
| ids_to_names=ids_to_names, | ||
| ) | ||
|
|
||
| elapsed = time.monotonic() - period_start | ||
| if has_multiple_regimes: | ||
| logger.info( | ||
| "Age: %s regimes=%d (%.1fs)", | ||
| age, | ||
| len(active_regimes), | ||
| elapsed, | ||
| ) | ||
| else: | ||
| logger.info("Age: %s (%.1fs)", age, elapsed) | ||
|
|
||
| total_elapsed = time.monotonic() - total_start | ||
| logger.info("Simulation complete (%.1fs)", total_elapsed) | ||
|
|
There was a problem hiding this comment.
The simulation module / functions are already pretty logic packed. Should we maybe pull this out into a logging function / warning function that hides the internals?
|
|
||
| # Unwrap partials before renaming, then re-wrap — this avoids rename_arguments | ||
| # seeing the already-bound keywords and is simpler than letting dags handle it. | ||
| if isinstance(func, functools.partial): | ||
| renamed = rename_arguments(func.func, mapper=mapper) | ||
| return cast( | ||
| "InternalUserFunction", | ||
| functools.partial(renamed, *func.args, **func.keywords), | ||
| ) |
There was a problem hiding this comment.
If any of func.keywords overlap with mapper, the rebound keywords will reference the old (pre-rename) parameter names on the renamed function. Should the keywords be renamed too?
renamed_keywords = {mapper.get(k, k): v for k, v in func.keywords.items()}
functools.partial(renamed, *func.args, **renamed_keywords)|
|
||
|
|
||
| @dataclass(frozen=True) | ||
| class SolveAndSimulateSnapshot: |
There was a problem hiding this comment.
This class is identical to the SimulateSnapshot class.
There was a problem hiding this comment.
Maybe we should also remove the solve_and_simulate logic by just having simulate where you can either pass the pre-computed vf_arr or not; and if not it behaves as solve_and_simulate.
| ) | ||
|
|
||
| logger.info("Starting solution") | ||
| has_multiple_regimes = len(internal_regimes) > 1 |
There was a problem hiding this comment.
Same question as in simulate.py
| save_solve_snapshot( | ||
| model=self, | ||
| params=params, | ||
| V_arr_dict=V_arr_dict, | ||
| log_path=Path(log_path), | ||
| log_keep_n_latest=log_keep_n_latest, |
There was a problem hiding this comment.
Have you checked whether this takes a long time for large models?
If so, could we not return V_arr_dict already while doing the snapshot in the background? Same holds for simulation snapshot.
| ) | ||
| return result | ||
|
|
||
| def solve_and_simulate( |
There was a problem hiding this comment.
- If the log level is on debug; will solve_and_simulate log both the SolutionSnapshot and the SimulationSnapshot, and is that not redundant then?
- Also here thinking again, we should probably merge solve_and_simulate and simulate.
There was a problem hiding this comment.
Why is there this file (test_debug_persistence) and the file "test_persistence.py"?
There was a problem hiding this comment.
It feels like this exists because of ttsim / gettsim. Maybe we want to be explicit then? We could even add ttsim to the testing environment and write a test with it, no?
| ] | ||
| dynamic = [ "version" ] | ||
| dependencies = [ | ||
| "cloudpickle>=3.1.2,<4", |
Fixes #97.
Fixes #142
Replace the boolean
debug_modeflag with a structured logging/persistence systemthat makes it easier to diagnose solve and simulation failures — especially NaN value
functions during parameter estimation.
Logging rework
debug_mode: bool→log_level: LogLevelonsolve(),simulate(), andsolve_and_simulate(). Four levels:"off""warning""progress""debug"Per-period timing in both solve and simulate (elapsed seconds, active regime count).
Debug snapshots
When
log_level="debug", complete snapshots are saved tolog_path:Three snapshot types:
SolveSnapshot,SimulateSnapshot,SolveAndSimulateSnapshot. Load withload_snapshot(path, exclude=(...))toskip large fields.
Automatic retention:
log_keep_n_latest=3(default) deletes oldest snapshots.Better error messages
InvalidValueFunctionErrornow includes regime name and NaN count(
"3 of 100 values are NaN in regime 'working_life'")Validation improvements
fixing shape mismatches with
MappingLeafparameterscodes decoded to labels, plus constraint details and grid bounds
Other
SimulationResult.to_pickle()/.from_pickle()for serialisationsave_solution()/load_solution()for value function arraysbatches by module):
test_solution_on_toy_model.py→test_solution_on_toy_model_deterministic.py+test_solution_on_toy_model_stochastic.py(speed whentest_shock_grids.py→ grid tests stay, draw tests split intotest_shock_draw.py@functools.cachefor faster test suitesDocumentation
docs/user_guide/debugging.md) — log levels,snapshot structure, NaN diagnosis recipes, value function inspection patterns