Environment
- Tekton Operator version: v0.78.1
- Kubernetes version: v1.33
- Platform: Kubernetes (bare-metal)
Description
The tekton-operator-proxy-webhook Service uses name: tekton-operator as
its pod selector. This label is also present on pods of the main
tekton-operator Deployment. As a result, the Service load-balances admission
webhook traffic across both Deployments, even though tekton-operator
pods do not listen on port 8443.
This causes approximately 50% of all admission webhook requests to fail
with connection refused, which — because the MutatingWebhookConfiguration
has failurePolicy: Fail — immediately rejects the creation of any TaskRun
pod.
Steps to Reproduce
- Deploy Tekton Operator v0.78.1 on a Kubernetes cluster.
- Inspect the endpoints of the
tekton-operator-proxy-webhook Service:
kubectl get endpoints tekton-operator-proxy-webhook -n <namespace>
- Observe that the Endpoints list includes pods from both
tekton-operator and tekton-operator-proxy-webhook Deployments.
- Trigger any Pipeline/TaskRun. Observe that roughly half of new TaskRun pod
creation attempts fail immediately.
Expected Behavior
The tekton-operator-proxy-webhook Service should only route traffic to
tekton-operator-proxy-webhook pods (port 8443). The tekton-operator pods
should never appear in this Service's Endpoints.
Actual Behavior
The Service Endpoints include pods from both Deployments:
# kubectl get endpoints tekton-operator-proxy-webhook -n tekton -o yaml
subsets:
- addresses:
- ip: 172.26.0.66 # tekton-operator-proxy-webhook pod ✅ serves on 8443
- ip: 172.26.1.157 # tekton-operator pod ❌ does not serve on 8443
ports:
- port: 8443
Stress-testing the Service directly (20 requests via ClusterIP) showed
roughly 50% failing with connection refused:
req-1: PASS (HTTP 415) req-2: FAIL (000) req-3: FAIL (000)
req-4: FAIL (000) req-5: FAIL (000) req-6: PASS (HTTP 415)
req-7: PASS (HTTP 415) req-8: FAIL (000) req-9: PASS (HTTP 415)
req-10: FAIL (000)
The failure manifests as the following error when creating TaskRun pods:
failed to create task run pod "<pod-name>":
Internal error occurred: failed calling webhook "proxy.operator.tekton.dev":
failed to call webhook:
Post "https://tekton-operator-proxy-webhook.<ns>.svc:443/defaulting?timeout=10s":
dial tcp <ClusterIP>:443: connect: connection refused
Note: the error appends a misleading hint ("Maybe missing or invalid Task
…") that does not reflect the real cause.
Root Cause
Both Deployments use the same pod template label name: tekton-operator:
tekton-operator Deployment (config/kubernetes/base/operator.yaml):
selector:
matchLabels:
name: tekton-operator # ← same label
template:
metadata:
labels:
name: tekton-operator # ← same label
tekton-operator-proxy-webhook Deployment
(cmd/kubernetes/operator/kodata/webhook/webhook.yaml):
selector:
matchLabels:
name: tekton-operator # ← collision!
template:
metadata:
labels:
name: tekton-operator # ← collision!
tekton-operator-proxy-webhook Service:
selector:
name: tekton-operator # ← matches both Deployments!
The same issue exists in the OpenShift manifest
(cmd/openshift/operator/kodata/webhook/webhook.yaml).
Proposed Fix
Change the proxy-webhook Deployment's matchLabels selector and pod template
label from name: tekton-operator to name: tekton-operator-proxy-webhook,
and update the Service selector to match. The existing app: tekton-operator
label is left unchanged.
I have a patch ready and will submit a PR.
Environment
Description
The
tekton-operator-proxy-webhookService usesname: tekton-operatorasits pod selector. This label is also present on pods of the main
tekton-operatorDeployment. As a result, the Service load-balances admissionwebhook traffic across both Deployments, even though
tekton-operatorpods do not listen on port 8443.
This causes approximately 50% of all admission webhook requests to fail
with
connection refused, which — because the MutatingWebhookConfigurationhas
failurePolicy: Fail— immediately rejects the creation of any TaskRunpod.
Steps to Reproduce
tekton-operator-proxy-webhookService:tekton-operatorandtekton-operator-proxy-webhookDeployments.creation attempts fail immediately.
Expected Behavior
The
tekton-operator-proxy-webhookService should only route traffic totekton-operator-proxy-webhookpods (port 8443). Thetekton-operatorpodsshould never appear in this Service's Endpoints.
Actual Behavior
The Service Endpoints include pods from both Deployments:
Stress-testing the Service directly (20 requests via ClusterIP) showed
roughly 50% failing with
connection refused:The failure manifests as the following error when creating TaskRun pods:
Note: the error appends a misleading hint ("Maybe missing or invalid Task
…") that does not reflect the real cause.
Root Cause
Both Deployments use the same pod template label
name: tekton-operator:tekton-operatorDeployment (config/kubernetes/base/operator.yaml):tekton-operator-proxy-webhookDeployment(
cmd/kubernetes/operator/kodata/webhook/webhook.yaml):tekton-operator-proxy-webhookService:The same issue exists in the OpenShift manifest
(
cmd/openshift/operator/kodata/webhook/webhook.yaml).Proposed Fix
Change the proxy-webhook Deployment's
matchLabelsselector and pod templatelabel from
name: tekton-operatortoname: tekton-operator-proxy-webhook,and update the Service selector to match. The existing
app: tekton-operatorlabel is left unchanged.
I have a patch ready and will submit a PR.