Skip to content

Enable Controller-managed versioned scaling resources with WorkerResourceTemplate#217

Open
carlydf wants to merge 53 commits intomainfrom
temporal-worker-owned-resource
Open

Enable Controller-managed versioned scaling resources with WorkerResourceTemplate#217
carlydf wants to merge 53 commits intomainfrom
temporal-worker-owned-resource

Conversation

@carlydf
Copy link
Collaborator

@carlydf carlydf commented Mar 6, 2026

What was changed

New CRD: WorkerResourceTemplate (WRT)

A new WorkerResourceTemplate CRD that lets users attach arbitrary namespaced Kubernetes resources (HPAs, PDBs, custom scalers, etc.) to a TemporalWorkerDeployment. The controller creates one copy of the resource per active worker version, with auto-injection of scaleTargetRef and selector.matchLabels to point at the correct versioned Deployment.

Key behaviors:

  • One copy per active Build ID, named {twdName}-{wrtName}-{buildID} (uniquely truncated to 47 chars, DNS-safe)
  • Auto-injects spec.scaleTargetRef (when explicitly null) to reference the versioned Deployment → enables per-version HPA autoscaling, and any other scaler that uses scaleTargetRef
  • Auto-injects selector.matchLabels (when explicitly null) with the correct per-version labels → enables per-version PDB targeting, and arbitrary CRDs that use selector.matchLabels to target versioned Deployments
  • Applied via Server-Side Apply with field manager "temporal-worker-controller"
  • Owner ref on each resource copy points to the versioned Deployment → k8s GC deletes copies when a version sunsets
  • Owner ref on the WorkerResourceTemplate itself points to the TWD → GC cascades on TemporalWorkerDeployment deletion
  • Apply status written back to WorkerResourceTemplate.status.versions[*] (Applied, Message, BuildID)
  • Template variable support: {{ .DeploymentName }}, {{ .TemporalNamespace }}, {{ .BuildID }} in any string field
  • Resource spec lives in spec.template (raw JSON/YAML embedded object)
  • Target TWD referenced via spec.temporalWorkerDeploymentRef.name

Validating Webhook

A WorkerResourceTemplateValidator webhook enforces:

  • apiVersion and kind required; metadata.name/metadata.namespace forbidden (controller sets these)
  • Allowed resource kinds configurable via ALLOWED_KINDS env var (default: HorizontalPodAutoscaler)
  • minReplicas ≠ 0 (currently required for approximate_task_queue_backlog metric-based autoscaling to work when queue is idle, plan to relax this in future)
  • scaleTargetRef and selector.matchLabels must be absent or null (controller owns injection)
  • SAR check: requesting user must be able to create/update the embedded resource type
  • SAR check: controller service account must be able to create/update the embedded resource type
  • spec.temporalWorkerDeploymentRef.name is immutable after creation

Helm chart updates

  • helm/temporal-worker-controller-crds/templates/temporal.io_workerresourcetemplates.yaml (new CRD manifest)
  • helm/temporal-worker-controller/templates/webhook.yaml (always-on WorkerResourceTemplate ValidatingWebhookConfiguration; TemporalWorkerDeployment webhook now behind webhook.enabled)
  • helm/temporal-worker-controller/templates/certmanager.yaml (cert-manager Issuer + Certificate for TLS, default enabled)
  • helm/temporal-worker-controller/Chart.yaml (cert-manager added as optional subchart dependency; opt in via certmanager.install: true)
  • helm/temporal-worker-controller/templates/manager.yaml (cert volume/port always present; ALLOWED_KINDS, POD_NAMESPACE, SERVICE_ACCOUNT_NAME env vars)
  • helm/temporal-worker-controller/templates/rbac.yaml (WorkerResourceTemplate+ SAR rules in manager ClusterRole; editor/viewer roles; configurable attached-resource RBAC)
  • helm/temporal-worker-controller/values.yaml (workerResourceConfig.allowedResources default: HPA, piped to ALLOWED_KINDS and to controller rbac)

Integration tests

New integration test subtests added to the existing envtest suite, all running through the shared testTemporalWorkerDeploymentCreation table-test runner:

  • WorkerResourceTemplate (7 tests): Deployment owner ref, matchLabels injection, multiple WorkerResourceTemplates on same TemporalWorkerDeployment, template variable rendering, multiple active versions, apply failure → Applied:false, SSA idempotency
  • Rollout gaps (5 tests): Progressive ramp to Current, ConnectionSpecHash annotation repair, gate input from ConfigMap, gate input from Secret, multiple deprecated versions
  • Webhook admission (5 tests, separate Ginkgo suite): Spec rejection, SAR pass, SAR fail (user), SAR fail (controller SA), temporalWorkerDeploymentRef.name immutability

Why?

HPA autoscaling for versioned Temporal workers requires a separate HPA per worker version, each targeting only that version's Deployment with the correct scaleTargetRef and label selectors. Without this CRD, users have no way to create per-version resources that the controller lifecycle-manages alongside the versioned Deployments.

Checklist

  1. Closes Enable CRUD of controller-managed scaling objects (and other custom scalers) #207

  2. How was this tested:

    • Full envtest integration test suite: new subtests covering WRT lifecycle, previously uncovered rollout scenarios, and webhook admission via the real HTTP admission path
    • Unit tests: webhook validator, SSA naming/injection helpers, planner integration
    • All tests pass: KUBEBUILDER_ASSETS=.../bin/k8s/1.27.1-darwin-arm64 go test -tags test_dep ./...
  3. Any docs updates needed?

    • docs/test-coverage-analysis.md updated to reflect all newly covered test scenarios*
      - will delete this before merging, but leaving it up during PR review
    • docs/wrt.md added: concept overview, HPA example with cert-manager setup, RBAC configuration guide

carlydf and others added 26 commits February 26, 2026 15:53
…ned Deployments

Introduces a new `TemporalWorkerOwnedResource` (TWOR) CRD that lets users attach
arbitrary namespaced Kubernetes resources (HPA, PDB, WPA, custom CRDs, etc.) to
each per-Build-ID versioned Deployment managed by a TemporalWorkerDeployment.

Key design points:
- One copy of the attached resource is created per active Build ID, owned by the
  corresponding versioned Deployment — Kubernetes GC deletes it automatically when
  the Deployment is removed, requiring no explicit cleanup logic.
- Resources are applied via Server-Side Apply (create-or-update), so the controller
  is idempotent and co-exists safely with other field managers (e.g. the HPA controller).
- Two-layer auto-population for well-known fields:
    Layer 1: `scaleTargetRef: null` and `matchLabels: null` in spec.object are
             auto-injected with the versioned Deployment's identity and selector labels.
    Layer 2: Go template expressions (`{{ .DeploymentName }}`, `{{ .BuildID }}`,
             `{{ .Namespace }}`) are rendered in all string values before apply.
- Generated resource names use a hash-suffix scheme (`{prefix}-{8-char-hash}`) to
  guarantee uniqueness per (twdName, tworName, buildID) triple even when the prefix
  is truncated; the buildID is always represented in the hash regardless of name length.
- `ComputeSelectorLabels` is now the single source of truth for selector labels used
  both in Deployment creation and in owned-resource matchLabels injection.
- Partial-failure isolation: all owned resources are attempted on each reconcile even
  if some fail; errors are collected and surfaced together.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extract getOwnedResourceApplies into planner package so it can be
  tested without a live API client
- Add OwnedResourceApply type and OwnedResourceApplies slice to Plan
- Thread twors []TemporalWorkerOwnedResource through GeneratePlan
- Add TestGetOwnedResourceApplies (8 cases: nil/empty inputs, N×M
  cartesian, nil Raw skipped, invalid template skipped)
- Add TestGetOwnedResourceApplies_ApplyContents (field manager, kind,
  owner reference, deterministic name)
- Add TestGetOwnedResourceApplies_FieldManagerDistinctPerTWOR
- Add two TWOR cases to TestGeneratePlan for end-to-end count check
- Add helpers: createTestTWOR, createDeploymentWithUID,
  createTestTWORWithInvalidTemplate

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both the controller plan field and the planner Plan field now share the
same name, making the copy-assignment self-documenting:
  plan.ApplyOwnedResources = planResult.ApplyOwnedResources

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Users don't need to template the k8s namespace (they already know it
when creating their TWOR in that namespace). The Temporal namespace is
more useful since it configures where the worker connects to.

- TemplateData.Namespace → TemplateData.TemporalNamespace
- RenderOwnedResource gains a temporalNamespace string parameter
- getOwnedResourceApplies threads the value from
  spec.WorkerOptions.TemporalNamespace down to RenderOwnedResource
- Update all tests: {{ .Namespace }} → {{ .TemporalNamespace }}
- GoTemplateRendering test now uses distinct k8s ns ("k8s-production")
  and Temporal ns ("temporal-production") to make the difference clear

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the admission webhook for TemporalWorkerOwnedResource with:
- Pure spec validation: apiVersion/kind required, metadata.name/namespace
  forbidden, banned kinds (Deployment/StatefulSet/Job/Pod/CronJob by default),
  minReplicas≠0, scaleTargetRef/matchLabels absent-or-null enforcement
- API checks: RESTMapper namespace-scope assertion, SubjectAccessReview for
  the requesting user and controller SA (with correct SA group memberships)
- ValidateUpdate enforces workerRef.name immutability and uses verb="update"
- ValidateDelete checks delete permissions on the underlying resource
- Helm chart: injects POD_NAMESPACE and SERVICE_ACCOUNT_NAME via downward API,
  BANNED_KINDS from ownedResources.bannedKinds values

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- cmd/main.go: register TemporalWorkerOwnedResourceValidator unconditionally
- webhook.yaml: rewrite to always create the webhook Service and TWOR
  ValidatingWebhookConfiguration; TWD validating webhook remains optional
  behind webhook.enabled
- certmanager.yaml: fix service DNS names, remove fail guard, default enabled
- manager.yaml: move cert volume mount and webhook port outside the
  webhook.enabled gate so the webhook server always starts
- values.yaml: default certmanager.enabled to true, clarify that
  webhook.enabled only controls the optional TWD webhook

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add helm/crds/temporal.io_temporalworkerownedresources.yaml so Helm
  installs the CRD before the controller starts
- Add temporalworkerownedresources get/list/watch/patch/update rules to
  the manager ClusterRole so the controller can watch and update status
- Add authorization.k8s.io/subjectaccessreviews create permission for
  the validating webhook's SubjectAccessReview checks
- Add editor and viewer ClusterRoles for end-user RBAC on TWOR objects

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TemporalWorkerOwnedResource supports arbitrary user-defined resource
types (HPA, PDB, custom CRDs) that are not known at install time.
Add a wildcard rule to the manager ClusterRole so the controller can
create/get/patch/update/delete any namespaced resource on behalf of
TWOR objects.

Security note: the TWOR validating webhook is a required admission
control that verifies the requesting user has permission on the
embedded resource type before the TWOR is admitted, so the
controller's broad permissions act as executor, not gatekeeper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the wildcard ClusterRole rule with a configurable list of
explicit resource type rules. Default to HPA and PDB — the two primary
documented TWOR use cases. Wildcard mode is still available as an
opt-in via ownedResources.rbac.wildcard=true for development clusters
or when users attach many different custom CRD types.

Operators add entries to ownedResources.rbac.rules for each additional
API group their TWOR objects will use (e.g. keda.sh/scaledobjects).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Consolidates bannedKinds and rbac under a single top-level key for
clarity. Update all template references accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add TWORName, TWORNamespace, BuildID to OwnedResourceApply so the
executor knows which status entry to update after each apply attempt.

Refactor the apply loop in execplan.go to collect per-(TWOR, BuildID)
results (success or error) and then, after all applies complete, write
OwnedResourceVersionStatus entries back to each TWOR's status
subresource. This means:

- Applied=true + ResourceName set on success
- Applied=false + Message set on failure
- All Build IDs for a TWOR are written atomically in one status update
- Apply errors and status write errors are both returned via errors.Join
  so the reconcile loop retries on either kind of failure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the double-nested errors.Join with a single call over the
concatenated slice, which is equivalent and more readable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a TemporalWorkerDeployment is reconciled, ensure each
TemporalWorkerOwnedResource referencing it has an owner reference
pointing back to the TWD (controller: true). This lets Kubernetes
garbage-collect TWOR objects automatically when the TWD is deleted.

The patch is skipped when the reference is already present (checked via
metav1.IsControlledBy) to avoid a write on every reconcile loop.
Uses client.MergeFrom to avoid conflicts with concurrent modifications.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
genplan should only read state and build a plan — not perform writes.
Instead of patching TWORs directly in generatePlan, build (base,
patched) pairs in genplan.go (pure computation) and let executePlan
apply them, consistent with how all other writes are structured.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add TWOROwnerRefPatch type and EnsureTWOROwnerRefs to planner.Plan
- Add getTWOROwnerRefPatches to planner package, unit-tested in
  planner_test.go (TestGetTWOROwnerRefPatches)
- GeneratePlan now accepts a twdOwnerRef and populates
  EnsureTWOROwnerRefs; genplan.go builds the OwnerReference from the
  TWD object and passes it through
- Owner ref patch failures in execplan.go now log-and-continue so that
  a deleted TWOR (race between list and patch) cannot block the more
  important owned-resource apply step

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ing it pre-built

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…bhook validator

- Add +kubebuilder:object:generate=false to TemporalWorkerOwnedResourceValidator
  (client.Client interface field was blocking controller-gen)
- Regenerate zz_generated.deepcopy.go: adds GateInputSource, GateWorkflowConfig,
  OwnedResourceVersionStatus deepcopy that were missing from the manual edit
- Regenerate CRD manifests: adds type:object to spec.object in TWOR CRD, field
  ordering change in TWD CRD
- Remove now-unused metav1 import from genplan.go (was missed in prior commit)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Tests the full reconciliation loop: create TWOR with HPA spec → controller
applies one HPA per active Build ID via SSA → asserts scaleTargetRef is
auto-injected with the correct versioned Deployment name → asserts
TWOR.Status.Versions shows Applied: true → asserts TWD controller owner
reference is set on the TWOR.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…functions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…etup

- docs/owned-resources.md: full TWOR reference (auto-injection, RBAC, webhook TLS, examples)
- examples/twor-hpa.yaml: ready-to-apply HPA example for the helloworld demo
- helm/webhook.yaml + values.yaml: add certmanager.caBundle for BYO TLS without cert-manager
- internal/demo/README.md: add cert-manager install step and TWOR demo walkthrough
- README.md + docs/README.md: add cert-manager prerequisite, TWOR feature bullet, and doc link

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Truncate owned resource names to 47 chars (safe for Deployments; avoids
  per-resource-type special cases if Deployment is ever un-banned)
- Fix docs: replace "active Build ID" with "worker version with running workers"
  throughout; "active" is reserved for Ramping/Current versions
- Fix docs: owned-resource deletion is due to versioned Deployment sunset, not
  a separate "version delete" operation
- Fix docs: scaleTargetRef injection applies to any resource type with that field,
  not just HPA; clarify webhook rejects non-null values because controller owns them
- Fix docs: remove undocumented/untested BYO TLS path; cert-manager is required
- Fix docs: expand TWOR abbreviation to full name throughout; remove ⏳ autoscaling
  bullet from README and clarify TWOR is the path for metric/backlog-based autoscaling
- Add note on how to inspect the banned kinds list (BANNED_KINDS env var)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ission

Implements all envtest-capable test scenarios identified in docs/test-coverage-analysis.md:

**TWOR integration tests (tests 1–7)** in internal/tests/internal/twor_integration_test.go:
- Owner reference on versioned Deployment points to TWOR's owning Deployment
- matchLabels auto-injection (null sentinel → selector labels)
- Multiple TWORs on the same TWD each produce independent resources
- Go template variables (DeploymentName, BuildID, TemporalNamespace)
- Multiple active build IDs each get their own owned resource instance
- Partial SSA failure is isolated per resource (other versions still apply)
- SSA idempotency: repeated reconciles produce no spurious updates

**Rollout integration tests (tests 8, 9, 10, 13)** in internal/tests/internal/rollout_integration_test.go:
- Progressive rollout auto-promotes to Current after the 30s pause expires
- Controller repairs stale ConnectionSpecHashAnnotation on a versioned Deployment
- Gate input from ConfigMap: blocks Deployment creation until ConfigMap exists
- Three successive rollouts accumulate two deprecated versions in status

**Webhook integration tests (tests 14–18)** in api/v1alpha1/temporalworkerownedresource_webhook_integration_test.go:
- Banned kind rejected via the real HTTP admission path
- SAR pass: admin user + controller SA with HPA RBAC → creation allowed
- SAR fail: impersonated user without HPA permission → rejected
- SAR fail: controller SA without HPA RBAC → rejected
- workerRef.name immutability enforced via real HTTP update

Supporting changes:
- config/webhook/manifests.yaml: add TWOR ValidatingWebhookConfiguration
- api/v1alpha1/webhook_suite_test.go: register TWOR webhook; add corev1/rbacv1/authorizationv1 to scheme; set controller SA env vars
- docs/test-coverage-analysis.md: corrections to webhook and autoscaling sections

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…de review feedback

- Convert runTWORTests and runRolloutTests into tworTestCases()/rolloutTestCases()
  table slices that run through the standard testTemporalWorkerDeploymentCreation runner,
  eliminating duplicate validation paths for TWD status and Temporal state
- Add WithTWDMutatorFunc and WithPostTWDCreateFunc hooks to TestCaseBuilder to support
  gate-ConfigMap blocking and other pre/post-create mutations without polluting the
  builder API
- Remove all test-number references (tests 1–7, Test 8, etc.) from comments; replace
  with self-contained scenario descriptions that don't depend on the coverage doc
- Improve WithValidatorFunction doc comment to precisely state execution order: runs
  after both verifyTemporalWorkerDeploymentStatusEventually and
  verifyTemporalStateMatchesStatusEventually have confirmed the expected TWD state

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mark TWOR gaps 1-7, rollout gaps 8-10/13, and webhook tests 14-18 as
implemented. Update subtest counts (28→39 integration, 0→5 webhook suite).
Update priority recommendations to reflect remaining gaps.
Mirror of gate-input-from-configmap using SecretKeyRef. Controller blocks
Deployment creation while the Secret is absent; creating the Secret unblocks
the reconcile loop and the version promotes to Current.

Also updates the test-coverage-analysis.md to mark Gate input from Secret as
covered (was the last open envtest gap).
@carlydf carlydf requested review from a team and jlegrone as code owners March 6, 2026 04:30
@@ -0,0 +1,372 @@
# Test Coverage Analysis
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be deleting this!

Also, it is AI generated, and I have only read the top half carefully. Use it for your information during review ("The Controller's Contract" might be an especially helpful section for us all to understand and agree on), but don't spend too much time on the details here. Not intended to be a part of the main repo.

Without this, 'go test ./...' in CI (where etcd is not installed) caused
BeforeSuite to crash immediately instead of skipping gracefully.

- BeforeSuite: Skip() early when KUBEBUILDER_ASSETS is unset
- AfterSuite: return early when testEnv is nil (BeforeSuite was skipped)
- test-unit: add envtest prerequisite and set KUBEBUILDER_ASSETS so the
  webhook integration tests actually run (not skip) in the unit test job
carlydf added 3 commits March 6, 2026 12:32
- use-errors-new: replace fmt.Errorf with errors.New for static error
  string in worker_controller.go TWOR field indexer

- unchecked-type-assertion (ownedresources.go):
  * rendered.(map[string]interface{}) → two-value form + error return
  * raw["metadata"].(map[string]interface{}) → check ok, skip nil branch
  * meta["labels"].(map[string]interface{}) → check ok, skip nil branch

- cognitive-complexity: split autoInjectFields (complexity 33) into
  three focused helpers:
  * buildScaleTargetRef — constructs the scaleTargetRef map
  * labelsAsInterface   — converts string labels to interface map
  * autoInjectInValue   — handles recursion into nested maps and slices
Copy link

@jaypipes jaypipes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlydf I realize Claude wrote the majority of this code, so take this review partly as a critique of the style of code that Claude produces :)

Comment on lines +33 to +36
// String values in the spec may contain Go template expressions:
// {{ .DeploymentName }} - the controller-generated versioned Deployment name
// {{ .TemporalNamespace }} - the Temporal namespace the worker connects to
// {{ .BuildID }} - the Build ID for this version

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing template variable interpolation is a dangerous game to get into because you now need to guard on the controller side (as opposed to the API machinery/schema-side) against malformed or malicious content. I'd recommend not supporting template variable interpolation for this first PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to remove this, but then realized that I think our basic HPA autoscaling will need it (to add the correct version tag to each approximate_backlog_count metric that we want to scale off of).

Could do in a separate PR though to make it easier to revert / review separately

carlydf and others added 14 commits March 18, 2026 01:11
Switch the TWOR kind restriction from a deny-list (BANNED_KINDS) to an
allow-list (ALLOWED_KINDS / ownedResourceConfig.allowedResources). An
empty allow-list now rejects all kinds instead of permitting everything,
making the safe default explicit and operator-configured.

The allowedResources Helm value unifies what was previously two separate
configs (bannedKinds + rbac.rules) into a single list of {kinds,
apiGroups, resources} entries that drives both the webhook kind check and
the controller's RBAC policy. The wildcard RBAC escape-hatch is removed;
RBAC is always derived from allowedResources. The default ships with
HorizontalPodAutoscaler allowed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The "Temporal" prefix is redundant under the temporal.io/v1alpha1 API
group. Rename the CRD kind, all derivative types, file names, variables,
comments, and Helm chart references throughout the codebase.

Notable renames:
- Kind: TemporalWorkerOwnedResource -> WorkerResourceTemplate
- Short name: twor -> wrt
- Ref type: TemporalWorkerDeploymentReference (field: temporalWorkerDeploymentRef)
- RBAC roles: temporalworkerownedresource-* -> workerresourcetemplate-*
- Files: twor_*.go -> wrt_*.go, ownedresources.go -> workerresourcetemplates.go

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace manual map[string]interface{} manipulation with the unstructured
library's typed helpers and Unstructured object methods:

- Unmarshal spec.template directly into &unstructured.Unstructured{}
- Use NestedFieldNoCopy for live spec access (no deep copy or write-back)
- Use obj.SetName/SetNamespace/GetLabels/SetLabels/SetOwnerReferences
  instead of manual nested map construction
- Use SetNestedMap/SetNestedStringMap in autoInjectFields
- Remove labelsAsInterface helper (superseded by SetNestedStringMap)
- Fix bug: spec modifications from autoInjectFields were previously lost
  because NestedMap returns a deep copy that was never written back

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the per-WRT dynamic field manager ("twc/{ns}/{name}") with the
standard Kubernetes pattern: a single constant named after the controller.
Resource names are already unique per WRT via ComputeWorkerResourceTemplateName,
so per-instance field managers were never necessary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Set certmanager.install=true to have cert-manager deployed as part of
this chart. Defaults to false for clusters where cert-manager is already
installed. Chart.lock pins the resolved version; run
`helm dependency update` to refresh. Downloaded .tgz archives are gitignored.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes CI failures after cert-manager was added as a conditional subchart
dependency. The downloaded .tgz is gitignored so CI must fetch it
explicitly before any helm command that reads the chart.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- CRDDirectoryPaths pointed to helm/temporal-worker-controller/crds which
  does not exist; CRDs are in helm/temporal-worker-controller-crds/templates
- Set ErrorIfCRDPathMissing: true so this can't silently fail again
- config/webhook/manifests.yaml was missing a ValidatingWebhookConfiguration
  for WorkerResourceTemplate, so envtest never routed admission requests to
  the webhook; added it alongside the existing TWD mutating config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@carlydf carlydf changed the title Temporal worker owned resource Enable Controller-managed versioned scaling resources with WorkerResourceTemplate Mar 20, 2026
carlydf and others added 7 commits March 19, 2026 21:25
- Remove stale MutatingWebhookConfiguration for TWD (not registered in main.go)
- Rename ValidatingWebhookConfiguration to wrt-validating-webhook-configuration
  so a future twd-validating-webhook-configuration can coexist cleanly

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- execplan.go: fix misleading comment about SSA field manager contention;
  clarify that HPA controller only writes status subresource, not spec
- planner.go: expand EnsureWRTOwnerRefs comment to distinguish it from the
  owner refs set on rendered resource copies (those point copy → versioned
  Deployment; these point WRT → TWD)
- worker_controller.go: revert idempotent RolloutComplete block — that fix
  belongs in a separate PR

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- skaffold.yaml: set certmanager.install=true so cert-manager is installed
  automatically as a Helm subchart instead of requiring a separate install step
- internal/demo/README.md: remove manual cert-manager install step; update
  prerequisites to mention the subchart; fix examples/twor-hpa.yaml ->
  examples/wrt-hpa.yaml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment on lines +49 to +50
// BuildID is the Build ID of the versioned Deployment this status entry refers to.
BuildID string `json:"buildID"`
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be better as a full version? (ie DeploymentName included)
DeploymentName would be the same for all version statuses in one WorkerResourceTemplate, because one WRT points immutably at one TWD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable CRUD of controller-managed scaling objects (and other custom scalers)

2 participants