Skip to content

Conversation

@moizpgedge
Copy link
Contributor

@moizpgedge moizpgedge commented Jan 19, 2026

  • Check spock.wait_for_sync_event return value instead of ignoring it

  • Change WaitForSyncEvent to return Query[bool] and use Scalar()

  • Fail add-node when sync isn’t confirmed (ErrReplicationSyncNotConfirmed)

Summary
Ensure add-node operations fail when spock.wait_for_sync_event does not confirm replication catch-up, instead of reporting success while data is missing.

Changes
Update postgres.WaitForSyncEvent to return Query[bool] so callers can read the result.
Check the boolean return value in WaitForSyncEventResource and fail when synced == false.
Add database.ErrReplicationSyncNotConfirmed for clearer error reporting.

Testing
go test ./...
make test-e2e (requires a running local control-plane cluster or an E2E_FIXTURE; currently fails with connect: connection refused if no cluster is running)

Checklist
[ ] Tests added or updated (unit and/or e2e, as needed)
[ ] Documentation updated (if needed)
[x] Issue is linked Issue
[ ] Changelog entry added for user-facing behavior changes
[ ] Breaking changes (if any) are clearly called out in the PR description

Notes for Reviewers
This change fixes a workflow correctness issue: previously we executed spock.wait_for_sync_event but discarded its boolean result, allowing workflows to succeed even when sync was not confirmed (e.g., due to underlying replication/subscription issues).

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced database replication verification to enforce synchronization confirmation and provide detailed diagnostic error messages including provider, subscriber, and LSN information when replication is not confirmed within the timeout period.

✏️ Tip: You can customize this high-level summary in your review settings.

- Check spock.wait_for_sync_event return value instead of ignoring it

- Change WaitForSyncEvent to return Query[bool] and use Scalar()

- Fail add-node when sync isn’t confirmed (ErrReplicationSyncNotConfirmed)
@coderabbitai
Copy link

coderabbitai bot commented Jan 19, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

These changes modify the WaitForSyncEvent operation to return a boolean scalar value instead of a statement, replacing Exec with Scalar and adding explicit error handling for replication synchronization confirmation failures.

Changes

Cohort / File(s) Summary
WaitForSyncEvent Return Type & Error Handling
server/internal/postgres/create_db.go, server/internal/database/wait_for_sync_event_resource.go
Updated WaitForSyncEvent return type from Statement to Query[bool]. Replaced Exec call with Scalar to retrieve boolean synced status. Enhanced error handling: ErrNoRows maps to resource.ErrNotFound, other errors wrapped. Added explicit check for unsynchronized state that returns detailed replication confirmation error with provider, subscriber, LSN, and timeout information.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A query now returns what we need to know,
Boolean truth of sync's gentle flow,
No more silent steps through the night,
Errors speak clear when sync's not right!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: fail add-node when spock sync event is not confirmed' directly and clearly describes the main change: making add-node operations fail when replication synchronization is not confirmed, which aligns perfectly with the changeset modifications to WaitForSyncEvent and error handling.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Warning

Tools execution failed with the following error:

Failed to run tools: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error)


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

} else if err != nil {
return fmt.Errorf("failed to wait for sync event on subscriber: %w", err)
}
if !synced {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After waiting for replication to catch up, this code checks Spock’s result.

  • If synced is false, it means the subscriber node did not reach the required replication point (the expected LSN) before the timeout.
  • Because of that, the control plane treats it as a real failure and stops the workflow.
  • The error message includes the provider node, subscriber node, the LSN it was waiting for, and the timeout, so anyone reading the logs can quickly see what sync check failed and why.

@moizpgedge moizpgedge requested review from Copilot and tsivaprasad and removed request for Copilot January 19, 2026 14:42
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical workflow correctness issue where spock.wait_for_sync_event results were ignored, allowing add-node operations to succeed even when replication synchronization failed.

Changes:

  • Modified WaitForSyncEvent to return Query[bool] instead of Statement to capture the sync confirmation result
  • Added explicit check of the boolean return value in WaitForSyncEventResource.Refresh to fail when sync is not confirmed
  • Introduced ErrReplicationSyncNotConfirmed error for clearer error reporting when replication sync fails

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
server/internal/postgres/create_db.go Changed WaitForSyncEvent return type from Statement to Query[bool] to enable result checking
server/internal/database/wait_for_sync_event_resource.go Added sync result validation and error handling when synced == false
server/internal/database/service.go Added ErrReplicationSyncNotConfirmed error definition

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

r.ProviderNode,
r.SubscriberNode,
syncEvent.SyncEventLsn,
100,
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout value 100 is hardcoded in multiple places (line 76 and line 89). Consider extracting this magic number into a named constant to improve maintainability and ensure consistency.

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@server/internal/database/wait_for_sync_event_resource.go`:
- Around line 76-90: The apiErr() mapping is missing
database.ErrReplicationSyncNotConfirmed so the Goa layer returns a generic 500;
update the apiErr function to add a case that converts
database.ErrReplicationSyncNotConfirmed into a semantic HTTP error (choose 409
Conflict or 504 Gateway Timeout per domain semantics) and return the
corresponding Goa error value (following the pattern used for other domain
errors in apiErr), ensuring the mapping references
database.ErrReplicationSyncNotConfirmed and the same Goa error constructor used
elsewhere.

@moizpgedge moizpgedge marked this pull request as draft January 19, 2026 14:46
Remove unused sentinel error and return a hardcoded message when sync isn't confirmed.
@jason-lynch
Copy link
Member

This is working as expected in my testing. Nice work!

@moizpgedge moizpgedge marked this pull request as ready for review January 21, 2026 14:59
@moizpgedge moizpgedge merged commit 90866e5 into main Jan 21, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants