Skip to content

fix(helm): add liveliness and readiness probes to helm chart#143

Merged
alukach merged 4 commits intomainfrom
fix/140/add-k8s-health-probes
Mar 28, 2026
Merged

fix(helm): add liveliness and readiness probes to helm chart#143
alukach merged 4 commits intomainfrom
fix/140/add-k8s-health-probes

Conversation

@alukach
Copy link
Copy Markdown
Member

@alukach alukach commented Mar 9, 2026

What I'm changing

This PR attempts to integrate our existing /healthz endpoint into K8s environments

How I did it

I admittedly know very little about Helm/K8s and reached for Claude Code (Opus 4.6) to complete this. Please review thoroughly

closes #140

@github-actions github-actions bot added the fix label Mar 9, 2026
@alukach alukach force-pushed the fix/140/add-k8s-health-probes branch from b88b60f to 3d046b5 Compare March 9, 2026 04:51
@alukach alukach requested review from batpad and pantierra March 9, 2026 05:14
@alukach alukach marked this pull request as ready for review March 9, 2026 05:46
Copy link
Copy Markdown

@thenav56 thenav56 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alukach, few concerns from my side.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to enable the health endpoint?

Currently, when running this inside a stac-auth-proxy pod container, it returns a 404:

>>> import httpx
>>> httpx.get("http://localhost:8000/healthz")
<Response [404 Not Found]>

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The healthz path DOES include the root_path, are you using that? ie does http://localhost:8000/stac/healthz work?

@thenav56
Copy link
Copy Markdown

thenav56 commented Mar 9, 2026

Note

Findings from a live environment test (tested in a separate internal cluster):

Config used

    - repoURL: https://github.com/developmentseed/stac-auth-proxy.git
      targetRevision: v1.0.3-rc2
      path: helm/
      helm:
        valuesObject:
          env:
            UPSTREAM_URL: "http://ifrcgo-montandon-eoapi-{{ $i }}-stac:8080"
            OIDC_DISCOVERY_URL: "https://goadmin-stage.ifrc.org/o/.well-known/openid-configuration"
            OVERRIDE_HOST: "0"
            ROOT_PATH: "/stac"
            COLLECTIONS_FILTER_CLS: stac_auth_proxy.montandon_filters:CollectionsFilter
            ITEMS_FILTER_CLS: stac_auth_proxy.montandon_filters:ItemsFilter
          ingress:
            enabled: "true"
            host: "montandon-eoapi-{{ $i }}-auth-proxy.ifrc-go.dev.togglecorp.com"
            className: "nginx"
            tls:
              enabled: "false"
          replicaCount: 1
          extraVolumes:
            - name: filters
              configMap:
                name: stac-auth-proxy-filters
          extraVolumeMounts:
            - name: filters
              mountPath: /app/src/stac_auth_proxy/montandon_filters.py
              subPath: montandon_filters.py
              readOnly: true

When setting ROOT_PATH=""

  • When OIDC_DISCOVERY_URL is down
    • The pod does not exit or retry at the application level (same as before).
    • After some time, Kubernetes deletes the pod due to the health check.
    • Pod status is green after OIDC_DISCOVERY_URL is up
  • When OIDC_DISCOVERY_URL is up
    • Pod status is green after few seconds

When setting ROOT_PATH="/stac"

  • The pod gets stuck in an error state.
    • logs shows 404 for /healthz

How should the health check be handled when ROOT_PATH is defined?

@alukach
Copy link
Copy Markdown
Member Author

alukach commented Mar 9, 2026

The pod does not exit or retry at the application level (same as before).

@thenav56 Am I correct that even though you're using the newer helm chart, that helm chart is still pointing at the latest docker image which is still using v1.0.2 (link):
image

Perhaps we should hardcode this value to the helm chart version?

image:
repository: ghcr.io/developmentseed/stac-auth-proxy
pullPolicy: IfNotPresent
tag: "latest"

@thenav56
Copy link
Copy Markdown

Hey @alukach, I was also wondering the same thing for the image tag

After including this, everything is working as expected:

image

Would it be feasible to expose the health endpoint at /healthz instead of $ROOT_PATH/healthz? This would avoid requiring additional configuration changes on the consumer side.

It was also a bit confusing since all requests under /stac are redirected to the STAC API server, but the /stac/healthz endpoint is not.

If the chart consumer need to include that, then we should also update the documentation and note this as a breaking change in the release.

@thenav56
Copy link
Copy Markdown

👋🏽 @alukach @pantierra
Just a reminder for this PR, let me know if I help on the changes using a fork PR?

@alukach
Copy link
Copy Markdown
Member Author

alukach commented Mar 25, 2026

@thenav56 Apologies for the delay, I was away last week.

Would it be feasible to expose the health endpoint at /healthz instead of $ROOT_PATH/healthz? This would avoid requiring additional configuration changes on the consumer side.

This is a bit more challenging than it would seem. The root path (/stac in your case) is handled by FastAPI and applied to all routes:

app = FastAPI(
openapi_url=None, # Disable OpenAPI schema endpoint, we want to serve upstream's schema
lifespan=build_lifespan(settings=settings),
root_path=settings.root_path,
)

if settings.healthz_prefix:
app.include_router(
HealthzHandler(upstream_url=str(settings.upstream_url)).router,
prefix=settings.healthz_prefix,
)

Removing the root path from the health check endpoint would be challenging. Not saying that we shouldn't do it, moreso it's something we could track in another issue.

type: integer
minimum: 0
description: "Seconds to sleep in preStop hook before SIGTERM, allowing Kubernetes endpoint propagation. Set to 0 to disable."
default: 15
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default value below is 5. Please make sure they are the same, either 5 or 15.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we make these types of check in a helm chart?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the schema is ignored in the checks because it is a yaml not json. This should fix. I am not entirely sure if checks would include the standard values, though.

In this PR i just changed the preStopSleepSeconds in values.yaml to 15.

terminationGracePeriodSeconds:
type: integer
minimum: 1
description: "Duration in seconds the pod needs to terminate gracefully. Must be greater than preStopSleepSeconds."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be greater than preStopSleepSeconds.

Can we have a check for this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check for this in the helm/templates/_helpers.tpl

@pantierra
Copy link
Copy Markdown
Contributor

Overall looks good to me, only a couple of small things I would improve, see my comments.

@pantierra pantierra force-pushed the fix/140/add-k8s-health-probes branch from 2996543 to 2b70f3b Compare March 27, 2026 15:20
@alukach alukach merged commit ba55f64 into main Mar 28, 2026
4 checks passed
@alukach alukach deleted the fix/140/add-k8s-health-probes branch March 28, 2026 02:54
alukach pushed a commit that referenced this pull request Mar 28, 2026
🤖 I have created a release *beep* *boop*
---


##
[1.0.3](v1.0.2...v1.0.3)
(2026-03-28)


### Bug Fixes

* **auth-extension:** consider link method when adding auth:refs
([158f507](158f507))
* disable server reload by default
([c109801](c109801)),
closes
[#142](#142)
* **helm:** add liveliness and readiness probes to helm chart
([#143](#143))
([ba55f64](ba55f64))
* **lifespan:** handle gateway errors on server health checks
([4e00c0e](4e00c0e)),
closes
[#141](#141)

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: ds-release-bot[bot] <116609932+ds-release-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Helm: add Liveness/Readiness for stac-proxy deployment

3 participants