Skip to content

Conversation

@justinfranco
Copy link
Contributor

Currently on startup if an origin is failing to connect it isn't added as a connection and isn't retried. This updated the logic so that even failing origins at startup are retried until connected or the stream is closed.

@justinfranco justinfranco requested a review from brunotm January 8, 2026 20:49
@justinfranco justinfranco requested a review from a team as a code owner January 8, 2026 20:49
@github-actions
Copy link

github-actions bot commented Jan 8, 2026

👋 justinfranco, thanks for creating this pull request!

To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team.

Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks!

Copy link
Collaborator

@akuzni2 akuzni2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested prior version code with new test and confirmed failure

if err != nil {
c.config.logInfo("client: failed to connect to origin %s: %s", origins[x], err)
errs = append(errs, fmt.Errorf("origin %s: %w", origins[x], err))
// Retry connecting to the origin in the background
Copy link
Collaborator

@akuzni2 akuzni2 Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved so bugfix could be published - but if time permits I'd like to see if we could simplify this to avoid having 2 initialization points for conn create, monitoring, tracking. Something like this might work

var wg sync.WaitGroup
errs := make([]error, len(origins))

for x := 0; x < len(origins); x++ {
    wg.Add(1)
    go func(i int) {
        defer wg.Done()
        conn, err := s.newWSconnWithRetry(origins[i])
        if err != nil {
            errs[i] = fmt.Errorf("origin %s: %w", origins[i], err)
            return
        }
        s.conns = append(s.conns, conn) // Note: may need a mutex
        go s.monitorConn(conn)
    }(x)
}
wg.Wait()

// Only fail if we couldn't connect to ANY origins
if len(s.conns) == 0 {
    return nil, fmt.Errorf("failed to connect to any origins in HA mode: %v", errs)
}
// ...

I believe if neither origin is reachable then this logic will make sure that newWSconnWithRetry returns with an error and we can fail newStream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I do agree I think just using the new connection method with retries would simplify things. I decided to keep both so that the newStream fails quickly if both origins aren't reachable to provide immediate feedback rather than going through all the retries ( 5-30 seconds).

I'd like to get this fix out though so I'll go ahead and merge the PR. I'm happy to discuss options here and make some further improvements though.

@justinfranco justinfranco merged commit 59666f4 into main Jan 9, 2026
9 checks passed
@justinfranco justinfranco deleted the ds-2069-fix-origin-handling-on-startup branch January 9, 2026 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants