RPC: Fix getObject() desync bugs#6835
Merged
knutwannheden merged 3 commits intomainfrom Feb 27, 2026
Merged
Conversation
a46d435 to
140c5b9
Compare
getObject() used localObjects as the diff baseline when receiving from the remote peer. But the remote computes diffs against the last synced state (remoteObjects), not the local state. When the local side modifies a tree (e.g., via a local recipe) before calling getObject(), the two baselines diverge, producing a hybrid tree that corrupts remoteObjects and causes subsequent transfers to fail with IndexError/desync. Fix: use remoteObjects (the last synced state) as the baseline, matching what the remote peer uses. Python's get_object_from_java() already did this correctly; Java and TypeScript had the same bug.
…tate desync When getObject() fails mid-deserialization (e.g., ClassCastException), the sender has already updated its tracking of what the receiver has, but the receiver never applied the change. This causes all subsequent RPC interactions for that object to compute diffs against the wrong baseline, leading to "Expected positions array" and similar desync errors. Fix by removing remoteObjects[id] when receive() throws, so the next interaction sends a full ADD (no delta), re-synchronizing both sides. Also fix Python's handle_visit() to always fetch the tree from Java via get_object_from_java(), matching the JS implementation. The previous code used a stale local_objects cache, which meant Python could operate on an outdated tree version after Java-side modifications or error recovery.
…oundary When serialized data items are an exact multiple of the handler's batchSize, END_OF_OBJECT ends up alone in a separate batch that receiver.receive() never pulls. Drain the pending batch after receive() completes — analogous to Java/JS's explicit q.take(). Add integration test with small batchSize to exercise the fix.
140c5b9 to
73dae42
Compare
getObject() desync bugs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three fixes for RPC state desynchronization issues during composite recipe execution.
1. Fix getObject() diff baseline
getObject()in Java and TypeScript usedlocalObjectsas the diff baseline when receiving trees from the remote peer. The remote computes diffs againstremoteObjects(the last synced state). When the local side modifies a tree via a local recipe before callinggetObject(), the baselines diverge, producing corrupt data.Fixed both Java and TypeScript to use
remoteObjectsas the baseline. Added a back-to-back RPC test that reproduces the bug by verifying that Markers (an object field sent as NO_CHANGE) comes from the synced baseline, not from locally modified state.2. Reset remote object tracking on getObject() failure
When
getObject()fails mid-deserialization, the sender has already updated its tracking of what the receiver has, but the receiver never applied the change. Subsequent operations compute diffs against the wrong baseline.Fixed by removing
remoteObjects[id]whenreceive()throws in all three implementations (Java, TypeScript, Python).3. Fix Python get_object_from_java missing END_OF_OBJECT on batch boundary
Python's
get_object_from_javastrips END_OF_OBJECT from batches and tracks it with a flag. When serialized data items are an exact multiple of the handler'sbatchSize, END_OF_OBJECT ends up alone in a separate batch thatreceive()never pulls.Fixed by draining the pending batch after
receive()completes — analogous to Java/JS's explicitq.take().Test plan
rewrite-coreRPC tests pass (including newgetObjectUsesRemoteObjectsAsBaselinetest)rewrite-javascriptRPC tests pass