-
Notifications
You must be signed in to change notification settings - Fork 10
DS-2069 Retry failing origins on startup #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 justinfranco, thanks for creating this pull request! To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team. Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks! |
akuzni2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested prior version code with new test and confirmed failure
| if err != nil { | ||
| c.config.logInfo("client: failed to connect to origin %s: %s", origins[x], err) | ||
| errs = append(errs, fmt.Errorf("origin %s: %w", origins[x], err)) | ||
| // Retry connecting to the origin in the background |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved so bugfix could be published - but if time permits I'd like to see if we could simplify this to avoid having 2 initialization points for conn create, monitoring, tracking. Something like this might work
var wg sync.WaitGroup
errs := make([]error, len(origins))
for x := 0; x < len(origins); x++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
conn, err := s.newWSconnWithRetry(origins[i])
if err != nil {
errs[i] = fmt.Errorf("origin %s: %w", origins[i], err)
return
}
s.conns = append(s.conns, conn) // Note: may need a mutex
go s.monitorConn(conn)
}(x)
}
wg.Wait()
// Only fail if we couldn't connect to ANY origins
if len(s.conns) == 0 {
return nil, fmt.Errorf("failed to connect to any origins in HA mode: %v", errs)
}
// ...
I believe if neither origin is reachable then this logic will make sure that newWSconnWithRetry returns with an error and we can fail newStream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I do agree I think just using the new connection method with retries would simplify things. I decided to keep both so that the newStream fails quickly if both origins aren't reachable to provide immediate feedback rather than going through all the retries ( 5-30 seconds).
I'd like to get this fix out though so I'll go ahead and merge the PR. I'm happy to discuss options here and make some further improvements though.
Currently on startup if an origin is failing to connect it isn't added as a connection and isn't retried. This updated the logic so that even failing origins at startup are retried until connected or the stream is closed.