Skip to content

fix(helm): grant node read access for GPU capacity checks#1106

Merged
jtoelke2 merged 5 commits intomainfrom
jtoelke/os-49-gpu-node-rbac
May 1, 2026
Merged

fix(helm): grant node read access for GPU capacity checks#1106
jtoelke2 merged 5 commits intomainfrom
jtoelke/os-49-gpu-node-rbac

Conversation

@jtoelke2
Copy link
Copy Markdown
Collaborator

@jtoelke2 jtoelke2 commented May 1, 2026

Summary

Grant the OpenShell Helm release cluster-scoped read access to Kubernetes nodes so the GPU capacity validation can list node capacity before sandbox creation.

Related Issue

OS-49: https://linear.app/nvidia/issue/OS-49/migrate-github-runners-to-a-supported-solution

Changes

  • Add a Helm ClusterRole allowing get/list/watch on core nodes.
  • Bind that ClusterRole to the configured OpenShell service account in the release namespace.
  • Restore the node read permission required by the Kubernetes driver GPU capacity check.

Testing

  • helm template openshell deploy/helm/openshell --namespace openshell
  • env PATH=/home/jtoelke/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin SCCACHE_DISABLE=1 mise run pre-commit
  • GPU E2E rerun after merge/deploy

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (not applicable: Helm RBAC-only fix)

Signed-off-by: Jonas Toelke <jtoelke@nvidia.com>
@jtoelke2 jtoelke2 requested a review from a team as a code owner May 1, 2026 04:23
@jtoelke2 jtoelke2 added the test:e2e-gpu Requires GPU end-to-end coverage label May 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Label test:e2e-gpu applied for ce4e538. Open the existing run and click Re-run all jobs to execute with the label set. The E2E Gate check on this PR will flip green automatically once the run finishes.

jtoelke2 added 4 commits May 1, 2026 06:21
Signed-off-by: Jonas Toelke <jtoelke@nvidia.com>
Allow /dev/dxg and /usr/lib/wsl as GPU baseline paths so WSL CDI GPU sandboxes can initialize NVML. Native Linux skips these entries when the paths do not exist.
@jtoelke2 jtoelke2 merged commit 32857eb into main May 1, 2026
67 of 71 checks passed
@jtoelke2 jtoelke2 deleted the jtoelke/os-49-gpu-node-rbac branch May 1, 2026 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e-gpu Requires GPU end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants