fix(session): avoid O(n²) semantic re-processing on memory commit by sliverp · Pull Request #809 · volcengine/OpenViking

sliverp · 2026-03-20T06:04:24Z

Problem

When a new memory is extracted via session.commit(), the session unconditionally enqueues a SemanticMsg without any changes information (line 315 of session.py). This causes _process_memory_directory() to re-summarise and re-vectorise every file in the memory directory, regardless of whether it changed.

The cumulative cost grows as O(n²) with memory count:

Store 1st memory → process 1 file
Store 100th memory → process 100 files
Store 500th memory → process 500 files
Total = 1 + 2 + ... + n = O(n²)

Real-world impact: ~500 memories consumed hundreds of thousands of embedding tokens per day on Volcengine, with 2000+ rate-limit (429) retries.

Root Cause

Two interacting issues:

session.py:315 — commit_async() always enqueues a full-directory SemanticMsg even when SessionCompressor._flush_semantic_operations() has already enqueued incremental messages with per-file change sets.
semantic_processor.py:369 — _process_memory_directory() only attempts to reuse existing summaries from .overview.md when msg.changes is not None. When the fallback path sends a message without changes, every file gets re-summarised.

Fix

session.py: Only enqueue the fallback SemanticMsg when the compressor is absent or extracted 0 memories. When the compressor runs successfully, it already handles incremental semantic processing.
semantic_processor.py: Always try to load existing summaries from .overview.md, regardless of whether msg.changes is set. This ensures even the fallback/redo path can skip unchanged files.

Tests

Added 4 unit tests in tests/unit/session/test_incremental_semantic.py:

test_commit_skips_fallback_semantic_when_compressor_flushed — verifies no duplicate SemanticMsg
test_commit_enqueues_fallback_semantic_when_no_compressor — verifies fallback still works
test_commit_enqueues_fallback_when_compressor_extracts_zero — edge case
test_semantic_msg_changes_none_by_default — documents the default

Impact

For a workspace with N memories, this reduces per-commit semantic processing from O(N) file summaries down to O(changed_files), and eliminates the O(N²) cumulative cost growth.

Closes #505
Ref #744

When SessionCompressor successfully extracts memories it already enqueues incremental SemanticMsg(s) with per-file change sets via _flush_semantic_operations(). The session.commit_async() fallback was unconditionally enqueueing a *second* SemanticMsg without any change info, causing the semantic processor to re-summarise and re-vectorise every file in the memory directory on every commit. Cost impact: cumulative token usage grew as O(n²) with memory count — 500 memories produced ~250K embedding tokens/day with 2000+ rate-limit retries. Changes: 1. session.py: Only enqueue fallback SemanticMsg when compressor is absent or extracted 0 memories. 2. semantic_processor.py: Always try to load existing overview.md summaries regardless of whether msg.changes is set, so even the fallback path can skip unchanged files. 3. Add 4 unit tests covering both paths. Closes volcengine#505 Ref volcengine#744

CLAassistant · 2026-03-20T06:04:32Z

All committers have signed the CLA.

myysy · 2026-03-20T10:43:52Z

Thanks for the PR. The final semantic step in commit_async targets the session path, which is different from the memory path flushed by the compressor, so it can’t be removed or skipped. We’ll revisit and adjust the session-path semantic logic later.

github-actions · 2026-03-21T04:29:37Z

Failed to generate code suggestions for PR

github-project-automation bot added this to OpenViking project Mar 20, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 20, 2026

qin-ctx requested a review from myysy March 20, 2026 06:20

qin-ctx assigned myysy Mar 20, 2026

qin-ctx closed this Mar 21, 2026

github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 21, 2026

qin-ctx reopened this Mar 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(session): avoid O(n²) semantic re-processing on memory commit#809

fix(session): avoid O(n²) semantic re-processing on memory commit#809
sliverp wants to merge 1 commit intovolcengine:mainfrom
sliverp:fix/incremental-memory-semantic-processing

sliverp commented Mar 20, 2026

Uh oh!

CLAassistant commented Mar 20, 2026 •

edited

Loading

Uh oh!

myysy commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sliverp commented Mar 20, 2026

Problem

Root Cause

Fix

Tests

Impact

Uh oh!

CLAassistant commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

myysy commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CLAassistant commented Mar 20, 2026 •

edited

Loading