Add native-Go OHE support to Connect chart and update RBAC#786
Add native-Go OHE support to Connect chart and update RBAC#786
Conversation
|
Hardly the most important thing but I think this rates a minor version bump. |
- Add `kubernetes.enabled` and related values to rstudio-connect for native-Go off-host execution (OHE) support - Update templates and config to support new OHE mode, overlays, and init container - Add validation to prevent both `launcher.enabled` and `kubernetes.enabled` - Update RBAC in rstudio-library to grant endpointslice list/watch permissions - Bump rstudio-connect to 0.8.27 and rstudio-library to 0.1.37 - Update NEWS.md and README.md.gotmpl for both charts
65644c0 to
d16a4a3
Compare
|
|
||
| ## 0.8.27 | ||
|
|
||
| - Adds support for a native-Go implementation of Off-Host Execution (OHE) mode. Set `kubernetes.enabled = true` to switch to the new implementation. See the [migration guide](<todo: insert migration guide docs link>) in the Connect admin documentation for guidance on migrating existing configurations. |
There was a problem hiding this comment.
Will the migration guide be in place before this PR is merged?
There was a problem hiding this comment.
Yes, I'm planning to merge this PR shortly after the March release when we can officially announce the feature.
There was a problem hiding this comment.
I think we should add more details to the change log. Maybe showing a brief example of how you would do templating with Launcher and how you now do it with the new native Kubernetes.
| # -- defaultJobOverlay contains the Kubernetes Job definition which is used as an overlay "base" when launching a content job | ||
| # in Kubernetes. Conceptually this is a similar to a Kustomize base. Connect then applies any required job configuration | ||
| # on-top of the overlay base to produce a final job definition before submitting the Job to Kubernetes. | ||
| # https://docs.posit.co/connect/__unreleased__/admin/appendix/off-host/direct-runner/kubernetes-job-overlays.html |
There was a problem hiding this comment.
This link will need to be updated prior to merging.
| # ([reference](https://docs.posit.co/chronicle/appendix/library/advanced-agent.html#environment)) | ||
| agentEnvironment: "" | ||
|
|
||
| kubernetes: |
There was a problem hiding this comment.
Is everyone happy with a value namespace of "kubernetes"?
There was a problem hiding this comment.
Yeah, i'd love to see this change to something more descriptive. goLauncher or nativeLauncher seem both more descriptive and more specific.
There was a problem hiding this comment.
This is a native implementation of off-host that doesn't use the launcher so that seems misleading.
kubernetes describes the backend that is being configured which is one of the available off-host execution backends. Currently only Launcher or Kubernetes (the new implementation) are accepted.
Maybe it makes sense to add a new offHostExecution namespace?
offHost:
kubernetes:
...or
ohe:
kubernetes:
...There was a problem hiding this comment.
I'd ask you to think about it from an admin perspective. They have no clue that this doesn't use something called launcher right? They don't/won't understand the difference between our internal "launcher" and this native approach. In their mind "launching" is a function, not a separate part of the code base. So I don't think this will be misleading to that audience.
That said, I'm totally cool with offHost I'd ask that we remove/eliminate the kubernetes concept/yaml level entirely. To my knowledge, Helm isn't applicable outside of Kubernetes, so why over specify it? Are you envisioning a future where we need an offHost.slurm? Since this a helm change, I'd treat it with that audience in mind. If there's a deeper meaning in the code, or elements of the future Connect roadmap that I'm not aware of, then maybe this distinction makes sense.
There was a problem hiding this comment.
Yes that was part of my reasoning with this recommendation.
We are working toward allowing multiple backends to be enabled at the same time so in that world it’s feasible that we could have connect deployed via helm in k8s and capable of running some jobs in k8s and some jobs in slurm. We’re not currently working on slurm though, so maybe that’s premature.
I think my hesitation about the launcher name is that that section configures the [Launcher] section in Connect’s configuration. Similarly, kubernetes would configure the [Kubernetes] section in the gcfg. To me that felt like a natural jump
There was a problem hiding this comment.
I don't love the namespace Kubernetes.
kubernetes would configure the [Kubernetes]
Are we fully backed on the [Kubernetes] name space in the Connect config? Maybe we can lean more into "offHost". E.g. [offHostKubernees], [offHostSlurm], etc. and then the values.yaml can follow a similar pattern.
offHost:
Kubernetes:
x: z
y: bAlso, I agree with everything Sam is saying about Launcher. That's inside baseball and I think users find it confusing.
One other thing I'm thinking about is our branding of this. Do we want to call it the native Go implementation? I think the fact that it's written in Go is also inside baseball and just adds another element of potential confusion. This is probably going to be the default in the future as a way we will want customers to run in Kubernetes. So maybe we want to lean into off-host Kubernetes and we can move away from any language around Launcher and not introduce language about Go.
| # -- defaultServiceOverlay contains the Kubernetes Service definition which is used as an overlay "base" when creating a content job's | ||
| # Service in Kubernetes. Conceptually this is a similar to a Kustomize base. Connect then applies any required Service configuration | ||
| # on-top of the overlay base to produce a final Service definition. | ||
| # https://docs.posit.co/connect/__unreleased__/admin/appendix/off-host/direct-runner/kubernetes-job-overlays.html |
Strongly agree |
samcofer
left a comment
There was a problem hiding this comment.
One thing I'd love to see come along with these changes are some examples of side by side conversions from "old-launcher" to "new-launcher" for some key values. If we can get those added to examples/connect/launcher-migration, it would be a huge help for SE's when we're talking to customers about moving over.
| | kubernetes.defaultInitContainer.repository | string | `"ghcr.io/rstudio/rstudio-connect-content-init"` | The repository to use for the Content InitContainer image | | ||
| | kubernetes.defaultInitContainer.tag | string | `""` | Overrides the image tag whose default is the chart appVersion. | | ||
| | kubernetes.defaultInitContainer.tagPrefix | string | `"ubuntu2204-"` | A tag prefix for the Content InitContainer image (common selections: jammy-, ubuntu2204-). Only used if tag is not defined | | ||
| | kubernetes.defaultJobOverlay | object | `{"apiVersion":"batch/v1","kind":"Job","spec":{"template":{"spec":{"containers":[{"name":"connect-content","volumeMounts":[{"mountPath":"/opt/rstudio-connect/R","name":"rsc-volume","subPath":"R"},{"mountPath":"/opt/rstudio-connect/python","name":"rsc-volume","subPath":"python"},{"mountPath":"/opt/rstudio-connect/scripts","name":"rsc-volume","subPath":"scripts"},{"mountPath":"/opt/rstudio-connect/ext","name":"rsc-volume","subPath":"ext"}]}],"initContainers":[{"name":"connect-content-init","volumeMounts":[{"mountPath":"/mnt/rstudio-connect-runtime/","name":"rsc-volume"}]}],"volumes":[{"emptyDir":{},"name":"rsc-volume"}]}}}}` | defaultJobOverlay contains the Kubernetes Job definition which is used as an overlay "base" when launching a content job in Kubernetes. Conceptually this is a similar to a Kustomize base. Connect then applies any required job configuration on-top of the overlay base to produce a final job definition before submitting the Job to Kubernetes. https://docs.posit.co/connect/__unreleased__/admin/appendix/off-host/direct-runner/kubernetes-job-overlays.html https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/#bases-and-overlays | |
There was a problem hiding this comment.
Why is there a populated defaultJobOverlay and not one for defaultServiceOverlay?
Are the values you have listed there meaningful? ie, if I override this with a less complete object, does the content fail?
There was a problem hiding this comment.
Yes, the defaults defined in the chart are required to configure the init-container. Any customization should be additive to the default values for the Job overlay.
The service overlay doesn't include any defaults because Connect will create a ClusterIP service by default.
Note: We'll be renaming defaultJobOverlay/defaultServiceOverlay to be defaultResourceJobBase/defaultResourceServiceBase for clarity. Calling them overlays is a little misleading.
There was a problem hiding this comment.
I think that this behavior:
Connect will create a ClusterIP service by default.
And this behavior:
the defaults defined in the chart are required to configure the init-container.
Seems counter intuitive right? Why does Connect do something without anything configured in the first case, but then require configuration to do the basics in the second case? My vote is to add the redundant serviceOverlay to the chart with the values that Connect is already defaulting too.
There was a problem hiding this comment.
Would it help if we moved the init container configuration into connect the helm chart logic so that no default job manifest is required unless you want to customize the job ? Maybe that simplification in the chart along with a fully commented (optional/example) job structure would be best.
More generally about the overlays, while it’s definitely more complex because of the deep nesting, the new job overlays are intended to be more straightforward and less surprising than the launcher templates. The job overlay is a real kubernetes job manifest so the official kubernetes documentation can be used as a valid reference rather than having to rely only on the examples from our helm chart docs.
There was a problem hiding this comment.
no default job manifest is required unless you want to customize the job
I think this approach makes sense. Just like with the old template values, you don't need to set anything unless you're doing customization.
The job overlay is a real kubernetes job manifest so the official kubernetes documentation can be used as a valid reference rather than having to rely only on the examples from our helm chart docs.
I think we should include several examples in the Helmchart documentation. And let's make it really clear that what you're putting in here is a job/service spec and we can link to the official Kubernetes docs.
| # ([reference](https://docs.posit.co/chronicle/appendix/library/advanced-agent.html#environment)) | ||
| agentEnvironment: "" | ||
|
|
||
| kubernetes: |
There was a problem hiding this comment.
Yeah, i'd love to see this change to something more descriptive. goLauncher or nativeLauncher seem both more descriptive and more specific.
| name: rsc-volume | ||
| initContainers: | ||
| - name: connect-content-init # well-known name | ||
| # Note: Image is set dyncamically by kubernetes.defaultInitContainer |
| # https://docs.posit.co/connect/__unreleased__/admin/appendix/off-host/direct-runner/kubernetes-job-overlays.html | ||
| # https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/#bases-and-overlays | ||
| defaultJobOverlay: | ||
| apiVersion: batch/v1 |
There was a problem hiding this comment.
Can we comment this out like the defaultServiceOverlay?
There was a problem hiding this comment.
And in that commented structure, can we call out the locations of the existing launcher.templateValues? similar to this:
helm/charts/rstudio-connect/values.yaml
Lines 346 to 380 in 87ae10f
Given that the default pathing is deeper and requires a more detailed knowledge of Kubernetes nesting and yaml, it would be nice to provide that structure as a bridge
There was a problem hiding this comment.
We'll provide a detailed upgrade guide that fully documents each field that users might have configured in the existing launcher.templateValues structure, along with where the values belong in the new job overlay.
We'll also provide a SKILL.md that customers can use to help automate the migration with claude or some other LLM.
Given that the default pathing is deeper and requires a more detailed knowledge of Kubernetes nesting and yaml, it would be nice to provide that structure as a bridge
I'm not quite sure what you mean by this, could you explain? Are you saying that we should include a fully populated and commented job/pod manifest so the users can see the structure?
There was a problem hiding this comment.
I think the example you provided above is helpful. I mean generally that in order to set a label on the content pod today, I need to set launcher.templateValues.pod.env but with the new structure being closer to a pod manifest, the path I need to find is, if I understand correctly, offHost.kubernetes.defaultJobOverlay.spec.template.spec.containers.env. Which is much deeper in the tree and feels slightly less intuitive, especially to admins who don't use Kubernetes regularly or are trying to learn.
I'm not suggesting that we provide an entire pod structure, I guess what I'm suggesting is that it would be helpful to provide a mapping guide from 1 set of values to the other.
Take the env example above, did the previous method of setting environment variables, pod.env also set those environment variables in the initContainer and any sidecars? Or was that separate? Or previous model masked some complexity and misled people about how some of the values mapped, imo. So, if I wanted to have the same output yaml between launcher and offHost how would I do that?
I'm assuming that this request is pretty well covered in your first paragraph, but it would be good to actually include that in the helm repo as an example as well.
There was a problem hiding this comment.
My preference would be to have little or no commented out YAML in the values.yaml. I always find that a bit confusing whether it was meant to be commented out or not. Instead, I'd rather have robust documentation. or comments that explain what can go in.
I also agree with Sam though that some of this documentation should be in the Helm Chart itself. Maybe some examples in the README.
| apiVersion: batch/v1 | ||
| kind: Job | ||
| spec: | ||
| template: | ||
| spec: | ||
| containers: | ||
| - name: connect-content | ||
| volumeMounts: | ||
| - mountPath: /opt/rstudio-connect | ||
| name: rsc-volume | ||
| initContainers: | ||
| - name: connect-content-init | ||
| volumeMounts: | ||
| - mountPath: /mnt/rstudio-connect-runtime/ | ||
| name: rsc-volume | ||
| volumes: | ||
| - emptyDir: {} | ||
| name: rsc-volume |
There was a problem hiding this comment.
@samcofer something like this?
| apiVersion: batch/v1 | |
| kind: Job | |
| spec: | |
| template: | |
| spec: | |
| containers: | |
| - name: connect-content | |
| volumeMounts: | |
| - mountPath: /opt/rstudio-connect | |
| name: rsc-volume | |
| initContainers: | |
| - name: connect-content-init | |
| volumeMounts: | |
| - mountPath: /mnt/rstudio-connect-runtime/ | |
| name: rsc-volume | |
| volumes: | |
| - emptyDir: {} | |
| name: rsc-volume | |
| apiVersion: batch/v1 | |
| kind: Job | |
| metadata: | |
| labels: # templateValues.job.labels | |
| app: data-processing | |
| environment: staging | |
| annotations: # templateValues.job.annotations | |
| my.company.com/annotation1: value | |
| spec: | |
| # Job lifecycle management | |
| ttlSecondsAfterFinished: 86400 # Time to live for the Job and its pods after they complete (e.g., 24 hours) | |
| template: | |
| metadata: | |
| labels: # templateValues.pod.labels | |
| app: data-processing | |
| annotations: # templateValues.pod.annotations | |
| my.company.com/annotation1: value | |
| spec: | |
| containers: | |
| - name: processing-container # sidecar container | |
| image: busybox | |
| command: | |
| [ | |
| "sh", | |
| "-c", | |
| "echo 'Starting job...'; sleep 60; echo 'Job finished'", | |
| ] | |
| - name: connect-content | |
| resources: # templateValues.pod.resources | |
| requests: | |
| memory: "64Mi" | |
| cpu: "250m" | |
| limits: | |
| memory: "128Mi" | |
| cpu: "500m" | |
| env: # templateValues.pod.env | |
| - name: LOG_LEVEL | |
| value: "INFO" | |
| volumeMounts: | |
| - mountPath: /opt/rstudio-connect | |
| name: rsc-volume | |
| initContainers: | |
| - name: connect-content-init | |
| volumeMounts: | |
| - mountPath: /mnt/rstudio-connect-runtime/ | |
| name: rsc-volume | |
| volumes: | |
| - emptyDir: {} | |
| name: rsc-volume | |
| nodeSelector: # templateValues.pod.nodeSelector | |
| kubernetes.io/os: linux | |
| tolerations: # templateValues.pod.tolerations | |
| - key: "key" | |
| operator: "Equal" | |
| value: "value" | |
| effect: "NoSchedule" |
There was a problem hiding this comment.
Yup, that's pretty much what I'm talking about. I still think it would be a benefit to have an example in the examples folder that has both a full pod: and new launcher defined, with some comments explaining how they produce the same yaml.
SamEdwardes
left a comment
There was a problem hiding this comment.
Thank you for the work on this, David. I like this new approach and the flexibility that allows admins to make changes without us having to template in every single possible specification.
A left a few comments. Happy to discuss anything further.
| name: rstudio-connect | ||
| description: Official Helm chart for Posit Connect | ||
| version: 0.8.26 | ||
| version: 0.8.27 |
There was a problem hiding this comment.
Piling on with the others. Lets make this 0.9. Big change.
|
|
||
| ## 0.8.27 | ||
|
|
||
| - Adds support for a native-Go implementation of Off-Host Execution (OHE) mode. Set `kubernetes.enabled = true` to switch to the new implementation. See the [migration guide](<todo: insert migration guide docs link>) in the Connect admin documentation for guidance on migrating existing configurations. |
There was a problem hiding this comment.
I think we should add more details to the change log. Maybe showing a brief example of how you would do templating with Launcher and how you now do it with the new native Kubernetes.
| | kubernetes.defaultInitContainer.repository | string | `"ghcr.io/rstudio/rstudio-connect-content-init"` | The repository to use for the Content InitContainer image | | ||
| | kubernetes.defaultInitContainer.tag | string | `""` | Overrides the image tag whose default is the chart appVersion. | | ||
| | kubernetes.defaultInitContainer.tagPrefix | string | `"ubuntu2204-"` | A tag prefix for the Content InitContainer image (common selections: jammy-, ubuntu2204-). Only used if tag is not defined | | ||
| | kubernetes.defaultJobOverlay | object | `{"apiVersion":"batch/v1","kind":"Job","spec":{"template":{"spec":{"containers":[{"name":"connect-content","volumeMounts":[{"mountPath":"/opt/rstudio-connect/R","name":"rsc-volume","subPath":"R"},{"mountPath":"/opt/rstudio-connect/python","name":"rsc-volume","subPath":"python"},{"mountPath":"/opt/rstudio-connect/scripts","name":"rsc-volume","subPath":"scripts"},{"mountPath":"/opt/rstudio-connect/ext","name":"rsc-volume","subPath":"ext"}]}],"initContainers":[{"name":"connect-content-init","volumeMounts":[{"mountPath":"/mnt/rstudio-connect-runtime/","name":"rsc-volume"}]}],"volumes":[{"emptyDir":{},"name":"rsc-volume"}]}}}}` | defaultJobOverlay contains the Kubernetes Job definition which is used as an overlay "base" when launching a content job in Kubernetes. Conceptually this is a similar to a Kustomize base. Connect then applies any required job configuration on-top of the overlay base to produce a final job definition before submitting the Job to Kubernetes. https://docs.posit.co/connect/__unreleased__/admin/appendix/off-host/direct-runner/kubernetes-job-overlays.html https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/#bases-and-overlays | |
There was a problem hiding this comment.
no default job manifest is required unless you want to customize the job
I think this approach makes sense. Just like with the old template values, you don't need to set anything unless you're doing customization.
The job overlay is a real kubernetes job manifest so the official kubernetes documentation can be used as a valid reference rather than having to rely only on the examples from our helm chart docs.
I think we should include several examples in the Helmchart documentation. And let's make it really clear that what you're putting in here is a job/service spec and we can link to the official Kubernetes docs.
| # ([reference](https://docs.posit.co/chronicle/appendix/library/advanced-agent.html#environment)) | ||
| agentEnvironment: "" | ||
|
|
||
| kubernetes: |
There was a problem hiding this comment.
I don't love the namespace Kubernetes.
kubernetes would configure the [Kubernetes]
Are we fully backed on the [Kubernetes] name space in the Connect config? Maybe we can lean more into "offHost". E.g. [offHostKubernees], [offHostSlurm], etc. and then the values.yaml can follow a similar pattern.
offHost:
Kubernetes:
x: z
y: bAlso, I agree with everything Sam is saying about Launcher. That's inside baseball and I think users find it confusing.
One other thing I'm thinking about is our branding of this. Do we want to call it the native Go implementation? I think the fact that it's written in Go is also inside baseball and just adds another element of potential confusion. This is probably going to be the default in the future as a way we will want customers to run in Kubernetes. So maybe we want to lean into off-host Kubernetes and we can move away from any language around Launcher and not introduce language about Go.
| # -- The namespace to launch connect-content jobs into. Uses the Release namespace by default | ||
| namespace: "" | ||
| defaultInitContainer: | ||
| # -- Whether to enable the defaultInitContainer. If disabled, you must ensure that the session components are available another way. |
There was a problem hiding this comment.
| # -- Whether to enable the defaultInitContainer. If disabled, you must ensure that the session components are available another way. | |
| # -- Whether to enable the defaultInitContainer. If disabled, you must ensure | |
| # that the session components are available another way. Changing the default | |
| # setting is an advanced option and not recommended. For more information on | |
| # how Connect uses the session init container refer to | |
| # https://docs.posit.co/connect/admin/appendix/off-host/arch-overview/#runtime-init-container |
| # https://docs.posit.co/connect/__unreleased__/admin/appendix/off-host/direct-runner/kubernetes-job-overlays.html | ||
| # https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/#bases-and-overlays | ||
| defaultJobOverlay: | ||
| apiVersion: batch/v1 |
There was a problem hiding this comment.
My preference would be to have little or no commented out YAML in the values.yaml. I always find that a bit confusing whether it was meant to be commented out or not. Instead, I'd rather have robust documentation. or comments that explain what can go in.
I also agree with Sam though that some of this documentation should be in the Helm Chart itself. Maybe some examples in the README.
| # in Kubernetes. Conceptually this is a similar to a Kustomize base. Connect then applies any required job configuration | ||
| # on-top of the overlay base to produce a final job definition before submitting the Job to Kubernetes. | ||
| # https://docs.posit.co/connect/__unreleased__/admin/appendix/off-host/direct-runner/kubernetes-job-overlays.html | ||
| # https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/#bases-and-overlays |
There was a problem hiding this comment.
Also add a link to the job spec.
| # https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/#bases-and-overlays | |
| # https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/#bases-and-overlays | |
| # https://kubernetes.io/docs/concepts/workloads/controllers/job/ |
| # -- defaultServiceOverlay contains the Kubernetes Service definition which is used as an overlay "base" when creating a content job's | ||
| # Service in Kubernetes. Conceptually this is a similar to a Kustomize base. Connect then applies any required Service configuration | ||
| # on-top of the overlay base to produce a final Service definition. | ||
| # https://docs.posit.co/connect/__unreleased__/admin/appendix/off-host/direct-runner/kubernetes-job-overlays.html |
There was a problem hiding this comment.
Also add a link to service spec.
| # https://docs.posit.co/connect/__unreleased__/admin/appendix/off-host/direct-runner/kubernetes-job-overlays.html | |
| # https://docs.posit.co/connect/__unreleased__/admin/appendix/off-host/direct-runner/kubernetes-job-overlays.html | |
| # https://kubernetes.io/docs/concepts/services-networking/service/ |
| # Starting with Connect v2023.05.0, this configuration is used to bootstrap the initial set of execution environments | ||
| # the first time the server starts. If any execution environments already exist in the database, these values are ignored; | ||
| # execution environments are not created or modified during subsequent restarts. | ||
| customRuntimeYaml: "base" |
There was a problem hiding this comment.
How are we handling custom runtime definitions with the new native Kubernetes approach?
I don't like our current approach with Launcher. It often trips up users that they define custom runtime environments in this YAML, but they only ever apply the first time you deploy Connect. If it's possible to define runtimes that get applied every time you update the values.yaml, That would be great, but I don't think that makes sense given how users can add manually or with an API.
So I would suggest for the new native approach, we don't provide the ability to bootstrap the runtime environments. It causes more confusion than benefit.
This PR adds experimental support for the new native-Go implementation of OHE. This implementation offers a simplified configuration of Kubernetes Jobs using simple a "overlay base" (similar to a Kustomize overlay base) to define the Job and Service as yaml. This offers an extensible approach for job customization w/o the maintenance burden of go templates.
kubernetes.enabledand related values to rstudio-connect for native-Go off-host execution (OHE) supportlauncher.enabledandkubernetes.enabled