Consume seid /status halt_intent for upgrade-driven supervisor decisions

> **Consumer-side tracking issue** for the cross-repo upgrade-shutdown contract. Contract is defined in sei-protocol/sei-config — see [sei-config#9](https://github.com/sei-protocol/sei-config/issues/9) and the design doc in [sei-config PR#8](https://github.com/sei-protocol/sei-config/pull/8). Producer-side work tracked at sei-protocol/sei-chain.

## Problem

When seid stops, the controller currently has no machine-readable signal of *why*. It cannot distinguish "operator forgot to upgrade the binary" from "state-machine bug" without log-scraping, which is brittle and racy.

The cross-repo contract being shipped via sei-config + sei-chain provides:
- Distinct process exit codes (70/71/72) for graceful halt reasons.
- An optional `halt_intent` field on seid's `/status` response carrying a structured `HaltIntent` (`ShutdownReason`, `PlanName`, `Height`, `Info`, `AnnouncedAt`).
- An opt-in seid flag `--halt-stay-alive` that keeps the process running with `/status` serving after consensus halts, so the controller can poll the live signal before the pod terminates.

This issue tracks the controller-side wiring to consume that signal.

## Scope

1. **Import sei-config** to pick up `ShutdownReason`, `ExitCode*` constants, `HaltIntent`, `ParseExitCode`. Pin to the matching version once the contract ships.

2. **Poll seid's `/status` endpoint** for the `halt_intent` field. When present and non-null, parse into the typed `HaltIntent`.

3. **Branch on `ShutdownReason`:**
   - `ShutdownReasonUpgradeRequired` (70): operator forgot to upgrade. Look up image mapping for `PlanName` (see open question below). If mapping exists, patch the Pod template image and recreate. If missing, set `status.conditions[Upgrading]=Unknown, reason=ImageMappingMissing`, emit a Kubernetes Event, and page on-call.
   - `ShutdownReasonBinaryTooNew` (71): we shipped a too-new binary. Page immediately. Roll back the image to the CRD's previous `status.runningImage`. Do not auto-advance.
   - `ShutdownReasonDowngradeDetected` (72): state is ahead of binary by a completed upgrade — botched rollback. Page immediately. Do not restart with same image; do not auto-pick a newer image. Human required.

4. **Fallback signal: process exit code.** For pods where `--halt-stay-alive` is off (or where the controller missed the live `/status` window), read `status.containerStatuses[*].lastState.terminated.exitCode` and apply the same branching via `seiconfig.ParseExitCode`.

5. **Pod-template guidance.** Controller-managed pods should set `restartPolicy: Never` (so kubelet doesn't crash-loop a binary that can't survive) and either disable cosmovisor or reconcile its presence with the new contract — see open question below.

## "Done" criteria

- Controller imports the shipped sei-config version and uses the typed enum throughout the supervisor branch.
- `/status` poller wired to read `halt_intent`, with a graceful-degradation path if the field is absent (older seid version) — fall back to exit-code path.
- Each `ShutdownReason` has a documented controller behavior path; emergency paging integrated for 71/72.
- Image-mapping lookup mechanism designed and implemented (see open question).
- Integration test simulating each halt scenario end-to-end (seid populates halt_intent → controller observes → correct action taken).

## Open questions

1. **Image mapping source.** When the controller observes `Reason=70 PlanName="v6"`, where does it look up `"v6" → ghcr.io/sei/seid:v6.0.1`? CRD field on `SeiNode`? ConfigMap? Annotation? Pre-stage decision needs a Coral-style design.
2. **Cosmovisor coexistence.** If the pod's container also runs cosmovisor, cosmovisor's own `os.Exit(0)` after a binary swap may mask our distinct exit codes from kubelet. Either disable cosmovisor in controller-managed pods, or define how the two coexist.
3. **Stay-alive grace window.** What's the maximum duration the controller will let a stay-alive pod linger before forcing termination? Is this a CRD field, a controller-wide config, or always supervisor-driven?

## References

- Contract definition: [sei-protocol/sei-config#9](https://github.com/sei-protocol/sei-config/issues/9)
- Design doc and rationale: [sei-protocol/sei-config#8](https://github.com/sei-protocol/sei-config/pull/8)
- Producer-side tracking: filed separately on sei-protocol/sei-chain

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consume seid /status halt_intent for upgrade-driven supervisor decisions #134

Problem

Scope

"Done" criteria

Open questions

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consume seid /status halt_intent for upgrade-driven supervisor decisions #134

Description

Problem

Scope

"Done" criteria

Open questions

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions