-
Notifications
You must be signed in to change notification settings - Fork 2
fix: fail add-node when spock sync event is not confirmed #252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fail add-node when spock sync event is not confirmed #252
Conversation
- Check spock.wait_for_sync_event return value instead of ignoring it - Change WaitForSyncEvent to return Query[bool] and use Scalar() - Fail add-node when sync isn’t confirmed (ErrReplicationSyncNotConfirmed)
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughThese changes modify the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Warning Tools execution failed with the following error: Failed to run tools: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error) Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| } else if err != nil { | ||
| return fmt.Errorf("failed to wait for sync event on subscriber: %w", err) | ||
| } | ||
| if !synced { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After waiting for replication to catch up, this code checks Spock’s result.
- If synced is false, it means the subscriber node did not reach the required replication point (the expected LSN) before the timeout.
- Because of that, the control plane treats it as a real failure and stops the workflow.
- The error message includes the provider node, subscriber node, the LSN it was waiting for, and the timeout, so anyone reading the logs can quickly see what sync check failed and why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a critical workflow correctness issue where spock.wait_for_sync_event results were ignored, allowing add-node operations to succeed even when replication synchronization failed.
Changes:
- Modified
WaitForSyncEventto returnQuery[bool]instead ofStatementto capture the sync confirmation result - Added explicit check of the boolean return value in
WaitForSyncEventResource.Refreshto fail when sync is not confirmed - Introduced
ErrReplicationSyncNotConfirmederror for clearer error reporting when replication sync fails
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| server/internal/postgres/create_db.go | Changed WaitForSyncEvent return type from Statement to Query[bool] to enable result checking |
| server/internal/database/wait_for_sync_event_resource.go | Added sync result validation and error handling when synced == false |
| server/internal/database/service.go | Added ErrReplicationSyncNotConfirmed error definition |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| r.ProviderNode, | ||
| r.SubscriberNode, | ||
| syncEvent.SyncEventLsn, | ||
| 100, |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout value 100 is hardcoded in multiple places (line 76 and line 89). Consider extracting this magic number into a named constant to improve maintainability and ensure consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@server/internal/database/wait_for_sync_event_resource.go`:
- Around line 76-90: The apiErr() mapping is missing
database.ErrReplicationSyncNotConfirmed so the Goa layer returns a generic 500;
update the apiErr function to add a case that converts
database.ErrReplicationSyncNotConfirmed into a semantic HTTP error (choose 409
Conflict or 504 Gateway Timeout per domain semantics) and return the
corresponding Goa error value (following the pattern used for other domain
errors in apiErr), ensuring the mapping references
database.ErrReplicationSyncNotConfirmed and the same Goa error constructor used
elsewhere.
Remove unused sentinel error and return a hardcoded message when sync isn't confirmed.
|
This is working as expected in my testing. Nice work! |
Check spock.wait_for_sync_event return value instead of ignoring it
Change WaitForSyncEvent to return Query[bool] and use Scalar()
Fail add-node when sync isn’t confirmed (ErrReplicationSyncNotConfirmed)
Summary
Ensure add-node operations fail when spock.wait_for_sync_event does not confirm replication catch-up, instead of reporting success while data is missing.
Changes
Update postgres.WaitForSyncEvent to return Query[bool] so callers can read the result.
Check the boolean return value in WaitForSyncEventResource and fail when synced == false.
Add database.ErrReplicationSyncNotConfirmed for clearer error reporting.
Testing
go test ./...
make test-e2e (requires a running local control-plane cluster or an E2E_FIXTURE; currently fails with connect: connection refused if no cluster is running)
Checklist
[ ] Tests added or updated (unit and/or e2e, as needed)
[ ] Documentation updated (if needed)
[x] Issue is linked Issue
[ ] Changelog entry added for user-facing behavior changes
[ ] Breaking changes (if any) are clearly called out in the PR description
Notes for Reviewers
This change fixes a workflow correctness issue: previously we executed spock.wait_for_sync_event but discarded its boolean result, allowing workflows to succeed even when sync was not confirmed (e.g., due to underlying replication/subscription issues).
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.