-
Notifications
You must be signed in to change notification settings - Fork 2
feat(etcd): enable host identity changes by updating peer URLs for improved stability #254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(etcd): enable host identity changes by updating peer URLs for improved stability #254
Conversation
📝 WalkthroughWalkthroughThis PR enhances etcd resilience by introducing hostname-based endpoint support alongside IP-based endpoints. Changes include adding hostname variants to endpoint emissions, implementing peer URL updates during startup via new UpdateMemberPeerURLs function, adding a SetSymmetricDifference utility, and updating embedConfig to advertise both IP and hostname endpoints. Comprehensive tests verify dual-endpoint functionality in standalone and multi-member scenarios. Changes
Sequence Diagram(s)sequenceDiagram
participant Server as Embedded Server
participant Client as etcd Client
participant Member as Member API
participant Config as Member Config
Server->>Server: Start etcd instance
Server->>Client: Obtain internal client
Client-->>Server: Client ready
Server->>Member: Call UpdateMemberPeerURLs
Member->>Member: List all members
Member->>Member: Find target member by name
Member->>Member: Compare new URLs vs existing<br/>(using SetSymmetricDifference)
alt URLs changed
Member->>Config: Update member PeerURLs
Config-->>Member: Update successful
Member-->>Server: Changed=true
Server->>Server: Log info: URLs updated
else URLs unchanged
Member-->>Server: Changed=false
Server->>Server: Log info: No changes needed
end
Note over Server: Continue normal startup
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@server/internal/etcd/embedded.go`:
- Around line 684-690: InitialCluster is built with "http://" while
AdvertisePeerUrls uses "https://", causing protocol mismatch; update the
InitialCluster construction (the fmt.Sprintf that sets c.InitialCluster using
cfg.HostID, advertiseHost, peerPort) to use the same scheme as AdvertisePeerUrls
(use "https://" instead of "http://") so peer URLs match TLS-enabled peers
during bootstrap; ensure the change is made where c.InitialCluster is assigned
and that it mirrors the pattern used later in the initialization code.
🧹 Nitpick comments (5)
clustertest/ip_hostname_test.go (3)
13-37: Test name doesn't match behavior.This test doesn't actually verify hostname usage when IP is unreachable - it only creates a cluster and verifies it's healthy. Consider renaming to
TestClusterInitializesSuccessfullyor adding actual IP unreachability verification.
56-65: RedundantStopbeforeRecreateWithEnv.
RecreateWithEnvalready terminates the container internally (viaTerminate), so the explicitStopcall on line 59 is unnecessary.Suggested simplification
host3 := cluster.Host("host-3") - tLog(t, "stopping host-3") - host3.Stop(t) - tLog(t, "recreating host-3 with an unreachable IP address (simulating IP change)") host3.RecreateWithEnv(t, map[string]string{ "PGEDGE_IPV4_ADDRESS": "10.99.99.99", })
157-195: Test name suggests write/read verification but doesn't perform any.
TestEtcdWriteReadUsesHostnameWhenIPUnreachableimplies it tests etcd key-value operations work with hostname connectivity, but it only verifies cluster health (same as other tests). Consider either:
- Renaming to
TestClusterHealthyAfterHostIPChange, or- Adding actual etcd write/read operations to verify data operations work
server/internal/etcd/embedded.go (1)
773-783: Function nameisIPReachableis misleading - it tests local bindability.This function checks if the host can bind to the IP (i.e., the IP is assigned to a local interface), not whether the IP is reachable over the network. The implementation is correct for the use case, but consider renaming for clarity.
Suggested rename
-func isIPReachable(ip string, logger zerolog.Logger) bool { +func isIPBindable(ip string, logger zerolog.Logger) bool { addr := fmt.Sprintf("%s:0", ip) listener, err := net.Listen("tcp", addr) if err != nil { - logger.Warn().Str("ip", ip).Err(err).Msg("cannot reach IP address") + logger.Warn().Str("ip", ip).Err(err).Msg("cannot bind to IP address") return false } listener.Close() - logger.Info().Str("ip", ip).Msg("successfully reachable to IP address") + logger.Info().Str("ip", ip).Msg("IP address is bindable locally") return true }Also update the call site at line 659:
- if cfg.Hostname != "" && !isIPReachable(cfg.IPv4Address, logger) { + if cfg.Hostname != "" && !isIPBindable(cfg.IPv4Address, logger) {clustertest/host_test.go (1)
47-51: Missing blank line betweenRecreateConfigandNewHost.There should be a blank line between the struct definition and the next function for consistency with Go formatting conventions.
Suggested fix
type RecreateConfig struct { EtcdMode EtcdMode ExtraEnv map[string]string } + func NewHost(t testing.TB, config HostConfig) *Host {
| // This will get overridden when joining an existing cluster | ||
| c.InitialCluster = fmt.Sprintf( | ||
| "%s=http://%s:%d", | ||
| cfg.HostID, | ||
| myIP, | ||
| advertiseHost, | ||
| peerPort, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the actual file content around the specified lines
cat -n server/internal/etcd/embedded.go | sed -n '680,695p'Repository: pgEdge/control-plane
Length of output: 679
🏁 Script executed:
# Search for InitialCluster usage patterns to understand how it's used
rg -n "InitialCluster" --type go -C 3Repository: pgEdge/control-plane
Length of output: 2082
🏁 Script executed:
# Check if there are any etcd-related config or documentation files
fd -e md -e txt | grep -i etcd | head -20Repository: pgEdge/control-plane
Length of output: 46
🏁 Script executed:
# Check what etcd expects for InitialCluster and search for documentation
rg -n "PeerURLs\|peer.*url" --type go server/internal/etcd/embedded.go | head -20Repository: pgEdge/control-plane
Length of output: 46
🏁 Script executed:
# Look at the full context around both cases (production and localhost) to understand the pattern
cat -n server/internal/etcd/embedded.go | sed -n '675,735p'Repository: pgEdge/control-plane
Length of output: 2390
Protocol mismatch: InitialCluster uses http:// but peer URLs use https://.
In the production configuration, AdvertisePeerUrls (line 682) uses https://, but InitialCluster (line 686) uses http://. This inconsistency will cause TLS handshake failures during cluster bootstrap. The initialization configuration (lines 715–731) shows the correct pattern where both use the same protocol.
Suggested fix
// This will get overridden when joining an existing cluster
c.InitialCluster = fmt.Sprintf(
- "%s=http://%s:%d",
+ "%s=https://%s:%d",
cfg.HostID,
advertiseHost,
peerPort,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // This will get overridden when joining an existing cluster | |
| c.InitialCluster = fmt.Sprintf( | |
| "%s=http://%s:%d", | |
| cfg.HostID, | |
| myIP, | |
| advertiseHost, | |
| peerPort, | |
| ) | |
| // This will get overridden when joining an existing cluster | |
| c.InitialCluster = fmt.Sprintf( | |
| "%s=https://%s:%d", | |
| cfg.HostID, | |
| advertiseHost, | |
| peerPort, | |
| ) |
🤖 Prompt for AI Agents
In `@server/internal/etcd/embedded.go` around lines 684 - 690, InitialCluster is
built with "http://" while AdvertisePeerUrls uses "https://", causing protocol
mismatch; update the InitialCluster construction (the fmt.Sprintf that sets
c.InitialCluster using cfg.HostID, advertiseHost, peerPort) to use the same
scheme as AdvertisePeerUrls (use "https://" instead of "http://") so peer URLs
match TLS-enabled peers during bootstrap; ensure the change is made where
c.InitialCluster is assigned and that it mirrors the pattern used later in the
initialization code.
Summary
This PR implements automatic detection and update of etcd member peer URLs when host IP or hostname changes. Each node now advertises both IP and hostname-based URLs for improved resilience.
Changes
Testing
Checklist
PLAT-161
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.