fix(queuefs): dedupe memory semantic parent enqueues (#769)#792
fix(queuefs): dedupe memory semantic parent enqueues (#769)#792Protocol-zero-0 wants to merge 1 commit intovolcengine:mainfrom
Conversation
…olcengine#769) Repeated memory writes enqueue the same parent directory for .overview/.abstract regeneration, causing redundant VLM work. Skip duplicate enqueues for the same (account, user, agent, uri) within 45s; resource/session semantic paths unchanged. Fixes volcengine#769 Made-with: Cursor
|
|
There was a problem hiding this comment.
Pull request overview
This PR mitigates excessive directory-level semantic recomputation for memory writes by coalescing repeated enqueues targeting the same parent directory within a short time window, addressing issue #769’s cost amplification on large memory folders.
Changes:
- Added a 45s in-process dedupe window for
context_type == "memory"inSemanticQueue.enqueue, returning"deduplicated"when coalesced. - Implemented a bounded-ish in-memory tracking map with periodic stale-key pruning.
- Added pytest coverage to ensure same-parent memory enqueues are deduped while other URIs/context types are not.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
openviking/storage/queuefs/semantic_queue.py |
Adds memory-only enqueue deduplication with locking and periodic pruning. |
tests/storage/test_semantic_queue_memory_dedupe.py |
Adds tests validating dedupe behavior for memory context and non-dedupe for other cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if msg.context_type == "memory": | ||
| key = self._memory_parent_semantic_key(msg) | ||
| now = time.monotonic() | ||
| with self._memory_parent_semantic_lock: | ||
| last = self._memory_parent_semantic_last.get(key, 0.0) | ||
| if now - last < _MEMORY_PARENT_SEMANTIC_DEDUPE_SEC: | ||
| logger.debug( | ||
| "[SemanticQueue] Skipping duplicate memory semantic enqueue for %s " | ||
| "(within %.0fs dedupe window; see #769)", | ||
| msg.uri, | ||
| _MEMORY_PARENT_SEMANTIC_DEDUPE_SEC, | ||
| ) | ||
| return "deduplicated" |
There was a problem hiding this comment.
enqueue() is async but uses a threading.Lock() critical section. Under contention this can block the event loop (and it currently includes a logger.debug(...) call inside the lock). Consider using an asyncio.Lock if this is only accessed from the event loop, or otherwise minimizing the locked region (compute the decision under lock, then log/return after releasing).
| if len(self._memory_parent_semantic_last) > 2000: | ||
| cutoff = now - (_MEMORY_PARENT_SEMANTIC_DEDUPE_SEC * 4) | ||
| stale = [k for k, t in self._memory_parent_semantic_last.items() if t < cutoff] | ||
| for k in stale[:800]: | ||
| self._memory_parent_semantic_last.pop(k, None) |
There was a problem hiding this comment.
The stale-key pruning path does an O(n) scan of the entire _memory_parent_semantic_last dict (stale = [...]) while holding the lock, and the map is only softly bounded (it can grow well beyond 2000 if enqueue rate is high within the cutoff window). This can become a hot-path bottleneck during bursts. Consider switching to a real bounded TTL/LRU structure (e.g., a deque of (ts,key) + dict, or a TTL cache) and pruning incrementally so per-enqueue work stays O(1) amortized.
Summary
Fixes volcengine/OpenViking#769.
Each memory write can enqueue the same parent directory for directory-level semantic regeneration (overview/abstract markdown). With no coalescing, rapid writes cause repeated expensive VLM work on large memory folders.
Change
SemanticQueue.enqueue, forcontext_type == "memory"only, skip enqueueing when the same dedupe key(account_id, user_id, agent_id, uri)was enqueued within the last 45 seconds."deduplicated"(no backend write) in that case; resource and other context types are unchanged.Tests
tests/storage/test_semantic_queue_memory_dedupe.py: same parent twice → oneNamedQueue.enqueue; different URIs /resource→ no dedupe.Trade-off
Directory-level semantics may lag up to one window after burst writes; matches the short-term mitigation described in the issue.