Skip to content

fix(p2p): add exponential backoff for reacher#5371

Draft
gacevicljubisa wants to merge 4 commits intomasterfrom
fix/reacher-backoff
Draft

fix(p2p): add exponential backoff for reacher#5371
gacevicljubisa wants to merge 4 commits intomasterfrom
fix/reacher-backoff

Conversation

@gacevicljubisa
Copy link
Member

@gacevicljubisa gacevicljubisa commented Feb 17, 2026

Checklist

  • I have read the coding guide.
  • My change requires a documentation update, and I have done it.
  • I have added tests to cover my changes.
  • I have filled out the description and linked the related issues.

Description

  • Implements exponential backoff for peer ping retries, with separate caps for success and failure:
    • Success: 10m -> 20m (capped at 2^2 x base)
    • Failure: 10m -> 20m -> 40m -> 80m (capped at 2^4 x base)
  • Adds configurable jitter (±20%) to retry intervals to prevent synchronized retry storms.
  • Reduces the number of worker goroutines from 16 to 8.
  • Tracks consecutive ping successes and failures per peer to adjust backoff dynamically.
  • Resets backoff counters on reconnect and uses a generation counter to discard stale ping results from in-flight pings that were dispatched before the reconnect.
  • Uses a far-future sentinel (time.Hour) for in-flight peers instead of the base retry duration, preventing re-dispatch while a ping is active.
  • Adds Prometheus metrics for the reacher: queue depth (bee_reacher_peers), ping attempt/error counters, and a ping duration histogram with buckets tuned for the 15s ping timeout.
  • Updates and expands unit tests to cover backoff, cap, reconnect-reset, and jitter logic. Tests disable jitter (JitterFactor: 0) for deterministic timing under synctest.

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

Screenshots (if appropriate):

@gacevicljubisa gacevicljubisa requested a review from janos February 17, 2026 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant