Skip to content

Conversation

@tsivaprasad
Copy link
Contributor

@tsivaprasad tsivaprasad commented Jan 20, 2026

Summary

This PR implements automatic detection and update of etcd member peer URLs when host IP or hostname changes. Each node now advertises both IP and hostname-based URLs for improved resilience.

Changes

  • Update PeerEndpoints() and ClientEndpoints() to return both IP and hostname URLs
  • Update embedConfig() to advertise both IP and hostname URLs
  • Add UpdateMemberPeerURLs() to detect and update peer URLs

Testing

  • Added test cases under the etcd_lifecycle_test package
  • Manually verified create-database and other normal operations

Checklist

PLAT-161

Summary by CodeRabbit

  • New Features

    • Improved etcd resilience by supporting both IP and hostname-based endpoints, automatically updating peer URLs when host network configurations change.
  • Tests

    • Added test coverage for dual IP and hostname endpoint variants in etcd cluster configurations.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 20, 2026

📝 Walkthrough

Walkthrough

This PR enhances etcd resilience by introducing hostname-based endpoint support alongside IP-based endpoints. Changes include adding hostname variants to endpoint emissions, implementing peer URL updates during startup via new UpdateMemberPeerURLs function, adding a SetSymmetricDifference utility, and updating embedConfig to advertise both IP and hostname endpoints. Comprehensive tests verify dual-endpoint functionality in standalone and multi-member scenarios.

Changes

Cohort / File(s) Summary
etcd Endpoint and Membership Updates
server/internal/etcd/embedded.go, server/internal/etcd/embedded_test.go, server/internal/etcd/membership.go, server/internal/etcd/membership_test.go
Added hostname variants to ClientEndpoints and PeerEndpoints; updated embedConfig to advertise both IP and hostname URLs; implemented UpdateMemberPeerURLs function to propagate peer URL changes during startup; added integration tests verifying dual-endpoint exposure for both IP and hostname, with test coverage for standalone and multi-member clusters.
Set Utility
server/internal/ds/set.go
Added SetSymmetricDifference generic function that computes symmetric difference between two slices, enabling comparison of peer URL sets regardless of order.
Changelog
changes/unreleased/Added-*.yaml
Added release note documenting improved etcd resilience through peer URL updates.

Sequence Diagram(s)

sequenceDiagram
    participant Server as Embedded Server
    participant Client as etcd Client
    participant Member as Member API
    participant Config as Member Config

    Server->>Server: Start etcd instance
    Server->>Client: Obtain internal client
    Client-->>Server: Client ready
    Server->>Member: Call UpdateMemberPeerURLs
    Member->>Member: List all members
    Member->>Member: Find target member by name
    Member->>Member: Compare new URLs vs existing<br/>(using SetSymmetricDifference)
    alt URLs changed
        Member->>Config: Update member PeerURLs
        Config-->>Member: Update successful
        Member-->>Server: Changed=true
        Server->>Server: Log info: URLs updated
    else URLs unchanged
        Member-->>Server: Changed=false
        Server->>Server: Log info: No changes needed
    end
    Note over Server: Continue normal startup
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • jason-lynch
  • mmols

Poem

🐰 A fluffy tale of endpoints true,
Both IP and hostname, a clever two-step brew!
When networks shift and addresses dance,
Our resilient etcd gets a second chance. 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: enabling host identity changes by updating peer URLs for improved stability, which aligns with the core functionality added across etcd modules.
Description check ✅ Passed The description includes Summary, Changes, Testing, and Checklist sections matching the template. All required sections are present with substantive content addressing the PR objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@server/internal/etcd/embedded.go`:
- Around line 684-690: InitialCluster is built with "http://" while
AdvertisePeerUrls uses "https://", causing protocol mismatch; update the
InitialCluster construction (the fmt.Sprintf that sets c.InitialCluster using
cfg.HostID, advertiseHost, peerPort) to use the same scheme as AdvertisePeerUrls
(use "https://" instead of "http://") so peer URLs match TLS-enabled peers
during bootstrap; ensure the change is made where c.InitialCluster is assigned
and that it mirrors the pattern used later in the initialization code.
🧹 Nitpick comments (5)
clustertest/ip_hostname_test.go (3)

13-37: Test name doesn't match behavior.

This test doesn't actually verify hostname usage when IP is unreachable - it only creates a cluster and verifies it's healthy. Consider renaming to TestClusterInitializesSuccessfully or adding actual IP unreachability verification.


56-65: Redundant Stop before RecreateWithEnv.

RecreateWithEnv already terminates the container internally (via Terminate), so the explicit Stop call on line 59 is unnecessary.

Suggested simplification
 	host3 := cluster.Host("host-3")
 
-	tLog(t, "stopping host-3")
-	host3.Stop(t)
-
 	tLog(t, "recreating host-3 with an unreachable IP address (simulating IP change)")
 	
 	host3.RecreateWithEnv(t, map[string]string{
 		"PGEDGE_IPV4_ADDRESS": "10.99.99.99", 
 	})

157-195: Test name suggests write/read verification but doesn't perform any.

TestEtcdWriteReadUsesHostnameWhenIPUnreachable implies it tests etcd key-value operations work with hostname connectivity, but it only verifies cluster health (same as other tests). Consider either:

  1. Renaming to TestClusterHealthyAfterHostIPChange, or
  2. Adding actual etcd write/read operations to verify data operations work
server/internal/etcd/embedded.go (1)

773-783: Function name isIPReachable is misleading - it tests local bindability.

This function checks if the host can bind to the IP (i.e., the IP is assigned to a local interface), not whether the IP is reachable over the network. The implementation is correct for the use case, but consider renaming for clarity.

Suggested rename
-func isIPReachable(ip string, logger zerolog.Logger) bool {
+func isIPBindable(ip string, logger zerolog.Logger) bool {
 	addr := fmt.Sprintf("%s:0", ip)
 	listener, err := net.Listen("tcp", addr)
 	if err != nil {
-		logger.Warn().Str("ip", ip).Err(err).Msg("cannot reach IP address")
+		logger.Warn().Str("ip", ip).Err(err).Msg("cannot bind to IP address")
 		return false
 	}
 	listener.Close()
-	logger.Info().Str("ip", ip).Msg("successfully reachable to IP address")
+	logger.Info().Str("ip", ip).Msg("IP address is bindable locally")
 	return true
 }

Also update the call site at line 659:

-	if cfg.Hostname != "" && !isIPReachable(cfg.IPv4Address, logger) {
+	if cfg.Hostname != "" && !isIPBindable(cfg.IPv4Address, logger) {
clustertest/host_test.go (1)

47-51: Missing blank line between RecreateConfig and NewHost.

There should be a blank line between the struct definition and the next function for consistency with Go formatting conventions.

Suggested fix
 type RecreateConfig struct {
 	EtcdMode EtcdMode
 	ExtraEnv map[string]string
 }
+
 func NewHost(t testing.TB, config HostConfig) *Host {

Comment on lines 684 to 690
// This will get overridden when joining an existing cluster
c.InitialCluster = fmt.Sprintf(
"%s=http://%s:%d",
cfg.HostID,
myIP,
advertiseHost,
peerPort,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the actual file content around the specified lines
cat -n server/internal/etcd/embedded.go | sed -n '680,695p'

Repository: pgEdge/control-plane

Length of output: 679


🏁 Script executed:

# Search for InitialCluster usage patterns to understand how it's used
rg -n "InitialCluster" --type go -C 3

Repository: pgEdge/control-plane

Length of output: 2082


🏁 Script executed:

# Check if there are any etcd-related config or documentation files
fd -e md -e txt | grep -i etcd | head -20

Repository: pgEdge/control-plane

Length of output: 46


🏁 Script executed:

# Check what etcd expects for InitialCluster and search for documentation
rg -n "PeerURLs\|peer.*url" --type go server/internal/etcd/embedded.go | head -20

Repository: pgEdge/control-plane

Length of output: 46


🏁 Script executed:

# Look at the full context around both cases (production and localhost) to understand the pattern
cat -n server/internal/etcd/embedded.go | sed -n '675,735p'

Repository: pgEdge/control-plane

Length of output: 2390


Protocol mismatch: InitialCluster uses http:// but peer URLs use https://.

In the production configuration, AdvertisePeerUrls (line 682) uses https://, but InitialCluster (line 686) uses http://. This inconsistency will cause TLS handshake failures during cluster bootstrap. The initialization configuration (lines 715–731) shows the correct pattern where both use the same protocol.

Suggested fix
 	// This will get overridden when joining an existing cluster
 	c.InitialCluster = fmt.Sprintf(
-		"%s=http://%s:%d",
+		"%s=https://%s:%d",
 		cfg.HostID,
 		advertiseHost,
 		peerPort,
 	)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// This will get overridden when joining an existing cluster
c.InitialCluster = fmt.Sprintf(
"%s=http://%s:%d",
cfg.HostID,
myIP,
advertiseHost,
peerPort,
)
// This will get overridden when joining an existing cluster
c.InitialCluster = fmt.Sprintf(
"%s=https://%s:%d",
cfg.HostID,
advertiseHost,
peerPort,
)
🤖 Prompt for AI Agents
In `@server/internal/etcd/embedded.go` around lines 684 - 690, InitialCluster is
built with "http://" while AdvertisePeerUrls uses "https://", causing protocol
mismatch; update the InitialCluster construction (the fmt.Sprintf that sets
c.InitialCluster using cfg.HostID, advertiseHost, peerPort) to use the same
scheme as AdvertisePeerUrls (use "https://" instead of "http://") so peer URLs
match TLS-enabled peers during bootstrap; ensure the change is made where
c.InitialCluster is assigned and that it mirrors the pattern used later in the
initialization code.

@tsivaprasad tsivaprasad marked this pull request as draft January 21, 2026 16:03
@tsivaprasad tsivaprasad changed the title feat(etcd): enabled hostname-based connectivity when IP is unreachable feat(etcd): enable host identity changes by updating peer URLs for improved stability Jan 22, 2026
@tsivaprasad tsivaprasad marked this pull request as ready for review January 22, 2026 19:52
@tsivaprasad tsivaprasad merged commit d6ff18b into main Jan 22, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants