Skip to content

fix: make service ready timeout configurable via CARTESI_SERVICE_READY_TIMEOUT#750

Closed
jplgarcia wants to merge 1 commit intocartesi:mainfrom
jplgarcia:fix/configurable-service-ready-timeout
Closed

fix: make service ready timeout configurable via CARTESI_SERVICE_READY_TIMEOUT#750
jplgarcia wants to merge 1 commit intocartesi:mainfrom
jplgarcia:fix/configurable-service-ready-timeout

Conversation

@jplgarcia
Copy link

Problem

DefaultServiceTimeout is hard-coded to 5 seconds. On slow or resource-constrained environments (e.g. fly.io) the inspect-server — and occasionally other services — take longer than 5 seconds to initialise, so the supervisor kills them and the node fails to start.

Measured on fly.io: inspect-server takes ~21 seconds to become ready.

Fix

Replace the const with a package-level var that reads CARTESI_SERVICE_READY_TIMEOUT from the environment at process start:

var DefaultServiceTimeout = func() time.Duration {
    if v := os.Getenv("CARTESI_SERVICE_READY_TIMEOUT"); v != "" {
        if d, err := time.ParseDuration(v); err == nil && d > 0 {
            return d
        }
    }
    return 5 * time.Second
}()
  • Fully backward-compatible: the 5-second default is unchanged when the variable is not set
  • Works for both ReadyTimeout and StopTimeout fallbacks in SupervisorService.Start()
  • Also affects DefaultServiceTimeout usage in http.go
  • All existing tests pass

Testing

Deployed on fly.io with CARTESI_SERVICE_READY_TIMEOUT=30s:

Service is ready  service=inspect-server    (after ~21s)
All services are ready  service=rollups-node

No Service timed out errors observed.

@jplgarcia jplgarcia force-pushed the fix/configurable-service-ready-timeout branch 2 times, most recently from 397083d to d4b4a59 Compare March 6, 2026 17:13
The DefaultServiceTimeout constant is hard-coded to 5 seconds, which
causes the inspect-server (and occasionally other services) to be killed
by the supervisor before they finish initialising on slow or
resource-constrained environments such as fly.io.

Replace the constant with a package-level var that reads
CARTESI_SERVICE_READY_TIMEOUT from the environment at process start.
The value must be a valid Go duration string (e.g. "30s", "1m").
If the variable is absent or invalid the original 5-second default is
preserved, so the change is fully backward-compatible.

Tested on fly.io: inspect-server takes ~21 s to become ready; with
CARTESI_SERVICE_READY_TIMEOUT=30s all services report ready and the
node runs normally.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant