Skip to content

fix(consolidation): prevent orphan observations when source deleted mid-consolidation#1090

Merged
nicoloboschi merged 1 commit intomainfrom
fix/consolidation-delete-race
Apr 16, 2026
Merged

fix(consolidation): prevent orphan observations when source deleted mid-consolidation#1090
nicoloboschi merged 1 commit intomainfrom
fix/consolidation-delete-race

Conversation

@nicoloboschi
Copy link
Copy Markdown
Collaborator

Summary

Closes a race between consolidation and memory deletion where an observation could be written referencing an already-deleted source memory.

The race: consolidation fetches source memory X, runs an LLM for several seconds, then inserts an observation with source_memory_ids = [X, ...]. If X is hard-deleted during the LLM call, the delete path's stale-observation sweep has already run and cannot see the not-yet-inserted row, leaving an orphan. source_memory_ids is a uuid[] column, so Postgres cannot cascade through it — cleanup is manual by necessity, which is why this is a latent bug in the current design.

Fix (two coordinated changes):

  • consolidator.py: new _filter_live_source_memories helper runs SELECT ... FOR SHARE against memory_units inside the same transaction as the INSERT/UPDATE, dropping uuids whose rows have been deleted and blocking concurrent deletes until the write commits. _create_observation_directly and _execute_update_action call it before writing; if no live sources remain, the action is skipped ({"action": "skipped", "reason": "sources_deleted"} for creates; no-op return for updates).
  • memory_engine.py: delete_memory_unit, delete_document, and delete_bank(fact_type=...) now DELETE the source rows before calling _delete_stale_observations_for_memories, so under READ COMMITTED the post-delete sweep also catches any observation a concurrent consolidation committed in the window.

Together these give us: either the consolidator sees the source gone and filters it, or the delete sees the new observation and sweeps it.

Three regression tests in test_observation_invalidation.py exercise the consolidator helpers directly:

  • mixed live/dead source_memory_ids → observation created with only live sources
  • all-dead source_memory_ids → create skipped, no row written
  • all-dead new sources on update → update skipped, observation untouched

Test plan

  • uv run pytest tests/test_observation_invalidation.py — 8/8 pass locally (3 new + 5 existing)
  • ./scripts/hooks/lint.sh — all passed
  • CI green

… deleted mid-consolidation

Consolidation reads a source memory, calls an LLM for several seconds, then
writes an observation referencing that source. If the source memory was
hard-deleted during the LLM call, the observation landed referencing a
now-missing uuid — the delete's stale-observation sweep had already run and
could not see the not-yet-inserted row. source_memory_ids is a uuid[] so
Postgres cannot cascade through it, making this manual cleanup necessary.

Two coordinated changes close the race:

- Consolidator filters source_memory_ids against live rows with SELECT ... FOR SHARE
  inside the same transaction as the INSERT/UPDATE, dropping any id whose row
  has already been deleted and blocking concurrent deletes until the write
  commits. Skips the create/update entirely when no live sources remain.
- Delete paths (delete_memory_unit, delete_document, delete_bank by fact_type)
  now DELETE the source rows first and run the stale-observation sweep
  afterwards, so any observation that was inserted concurrently is also
  caught by the sweep under READ COMMITTED.

Adds three regression tests exercising the consolidator helpers directly with
mixed live/dead and all-dead source_memory_ids.
@nicoloboschi nicoloboschi merged commit f9042e3 into main Apr 16, 2026
51 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant