Skip to content

Support nodeKey on all SeiNode spec types (fullNode, archive, replayer) #141

@bdchatham

Description

@bdchatham

Problem

PR #140 added validator.nodeKey to support stable P2P node identity for migrating validators, but the field lives only on ValidatorSpec. Other SeiNode modes — fullNode, archive, and replayer — auto-generate a fresh node_key.json on every PVC creation, giving them a new node ID on every fresh deployment. Operators migrating fullNodes (RPC, sentry) or archive nodes from EC2 to K8s face the same impersonation-signal problem the validator feature solved: their permanent libp2p identity changes on cutover, persistent_peers entries elsewhere in the network become stale, peer reputation accumulated on the EC2 host is lost, and any topology tracker mapping node IDs to monikers shows a discontinuity.

Impact

  • fullNode migrations break sentry topology. Sentries are often pinned in validators' private_peer_ids configs by node ID. A new node ID on the K8s sentry silently breaks the gossip-shielding link to the validator. Same breakage shape as the validator case, potentially affecting more infrastructure since fullNodes are typically the public-facing tier.
  • Archive nodes and snapshotters lose accumulated peer reputation. These are long-lived; the network has learned which IDs to trust and route to. Regen drops that.
  • Inconsistent migration UX. Validators get a clean cutover path via PR feat: validator nodeKey configuration on SeiNode #140; everything else has to either regenerate node IDs (operationally painful) or accept the loss (operationally inconvenient).
  • Operators today work around it manually by scraping node_key.json from the EC2 host and kubectl cp-ing it onto the new PVC after deployment — out-of-band, no audit trail, no immutability guarantee.

Relevant experts

  • kubernetes-specialist — pod-spec mutation across all modes; bootstrap-vs-production policy uniformity
  • platform-engineer — API placement (top-level SeiNodeSpec.nodeKey vs. per-mode duplication); cross-field coupling rules with the existing validator signingKey
  • product-manager — scope cut: which modes ship in v1, what to defer

Proposed approach

The cleanest API is to promote nodeKey from ValidatorSpec up to SeiNodeSpec — node identity is orthogonal to mode (every Tendermint node has a node_key.json regardless of role). This is technically a breaking change to the just-shipped validator.nodeKey field, but PR #140 hasn't been live long enough for any operator to depend on it; the migration cost is essentially zero if we move now.

// api/v1alpha1/seinode_types.go
type SeiNodeSpec struct {
    // existing fields...

    // NodeKey declares the source of this node's P2P node key
    // (node_key.json). Universal across modes — fullNode, archive,
    // replayer, and validator. When omitted, seid auto-generates a
    // fresh node_key.json on first start.
    // +optional
    NodeKey *NodeKeySource `json:"nodeKey,omitempty"`
}

ValidatorSpec loses its NodeKey field; SigningKey stays where it is. The signingKey ↔ nodeKey coupling rule lifts to SeiNodeSpec level:

!has(self.validator) || !has(self.validator.signingKey) || has(self.nodeKey)

(Reads as: "If validator.signingKey is set, nodeKey must be set." validatorPlanner.Validate gets the same Go-side mirror.)

Pod-spec wiring is mostly a path change. buildNodePodSpec already calls nodeKeyVolumes(node) / nodeKeyMounts(node); the helpers' nodeKeySecretSource path-checks node.Spec.Validator.NodeKey today and would change to node.Spec.NodeKey. One field-path change in the helper; callers unchanged.

Bootstrap-pod policy: production-only across all modes. The same peer-reputation argument that produced the production-only mount for validators applies to other modes — bootstrap pods crash, halt, restart, and their misbehavior shouldn't attribute to the node's permanent libp2p identity.

Replayer caveat. Replayers are ephemeral by design — restore from snapshot, replay, exit. They typically don't need a stable node ID. Allow the field structurally (it's optional everywhere) but document in the godoc that replayers usually don't set it.

Acceptance criteria

  • nodeKey moves from ValidatorSpec to SeiNodeSpec; the validator field is removed
  • Pod-spec helpers updated to read from spec.nodeKey instead of spec.validator.nodeKey
  • signingKey ↔ nodeKey coupling rule lifted to SeiNodeSpec CEL; validatorPlanner.Validate mirrors
  • Pod-spec test coverage: each mode (fullNode, archive, replayer) has a test confirming nodeKey volume + mount when set, and absence when unset
  • Bootstrap-pod regression guard extended: every bootstrap path verifies nodeKey absence even when SeiNodeSpec.nodeKey is set
  • Integration test: fullNode SeiNode with nodeKey set boots production pod with node_key.json mounted at expected path
  • LLD updated (or new mini-LLD added) documenting the cross-mode generalization and the field-relocation breaking change

Out of scope

  • Adding signingKey to non-validator modes. signingKey is consensus-specific and stays on ValidatorSpec.
  • Drift detection for mid-life nodeKey patch on Running nodes. Covered by Detect spec drift on Running nodes for mid-life SigningKey patch (and future validator mode switch) #137 — would extend naturally to all modes once nodeKey is at SeiNodeSpec level.
  • ConfigMap variant for nodeKey. Defer; Secret-grade is the v1 shape.
  • Auto-rotation. Like signingKey, nodeKey.secret.secretName is immutable; rotation requires delete-and-recreate.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions