Skip to content

MILAB-6382: expose host NVIDIA driver via runtime envs#91

Merged
blackcat merged 1 commit into
mainfrom
fix/rapids-nvidia-env
Jun 17, 2026
Merged

MILAB-6382: expose host NVIDIA driver via runtime envs#91
blackcat merged 1 commit into
mainfrom
fix/rapids-nvidia-env

Conversation

@blackcat

@blackcat blackcat commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds four NVIDIA GPU environment variables to the python-3.12.10-rapids run environment so that Python processes can discover and use the host NVIDIA driver when running inside a GPU-enabled container.

  • NVIDIA_VISIBLE_DEVICES=all and NVIDIA_DRIVER_CAPABILITIES=compute,utility: NVIDIA Container Runtime control variables that expose all GPU devices and mount CUDA compute + utility capabilities into the container.
  • LD_LIBRARY_PATH and PATH: Prepend /usr/local/nvidia/lib64, /usr/local/nvidia/lib, and /usr/local/nvidia/bin so the linker and shell can find NVIDIA shared libraries and utilities. The PATH entry uses ${PATH} variable interpolation — a pattern not previously used in any envVars field in this codebase, whose runtime support is unconfirmed. LD_LIBRARY_PATH does not reference its existing value and will overwrite any previously set paths.

Key terms touched by this PR:

Term Definition Change
envVars Array of KEY=VALUE strings in the block-software artifact spec; applied as environment variables when the run environment is activated by the Platforma runtime Extended from 1 entry (RPY2_CFFI_MODE) to 5 entries; introduces variable interpolation (${PATH}) for the first time in this codebase
NVIDIA_VISIBLE_DEVICES NVIDIA Container Runtime variable controlling which GPU devices are visible inside the container Newly added as all — exposes every available GPU to the Python process
NVIDIA_DRIVER_CAPABILITIES NVIDIA Container Runtime variable specifying which driver API surface areas (compute, utility, video, etc.) are mounted into the container Newly added as compute,utility — enables CUDA compute and nvidia-smi-style utility access
LD_LIBRARY_PATH Linux dynamic linker search path for shared libraries Newly set to /usr/local/nvidia/lib64:/usr/local/nvidia/lib; overwrites any pre-existing value
PATH Executable search path used by the shell and most launchers Newly prepended with /usr/local/nvidia/bin using ${PATH} interpolation — a pattern unverified for this JSON-based config format

Confidence Score: 3/5

The change is small but introduces a ${PATH} interpolation pattern that has never appeared in any other envVars entry in this codebase; if the Platforma runtime sets environment variables as literal strings rather than through a shell, the Python run environment's PATH will be broken on every execution.

The ${PATH} interpolation in PATH=/usr/local/nvidia/bin:${PATH} is the first of its kind across all envVars definitions in the repository. If the runtime simply calls setenv/putenv with the raw string, every binary lookup outside /usr/local/nvidia/bin will fail. This needs explicit confirmation before the package ships. Secondary concerns are the unconditional overwrite of LD_LIBRARY_PATH and the NVIDIA-only paths being applied to macOS and Windows roots.

python-3.12.10-rapids/package.json — specifically the PATH env var entry with ${PATH} interpolation and the LD_LIBRARY_PATH overwrite need verification before merging.

Important Files Changed

Filename Overview
python-3.12.10-rapids/package.json Adds four NVIDIA GPU runtime env vars to expose the host driver; the ${PATH} interpolation syntax is untested in this JSON context and could break the run environment if the runtime doesn't expand it.
.changeset/lemon-pants-chew.md Standard changeset entry recording a minor version bump for the RAPIDS package; description is brief but accurate.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[package.json block-software envVars] --> B{Runtime applies envVars}
    B -->|Literal string copy| C["PATH = /usr/local/nvidia/bin:\${PATH}\n⚠️ \${PATH} not expanded"]
    B -->|Literal string copy| D["LD_LIBRARY_PATH = /usr/local/nvidia/lib64:/usr/local/nvidia/lib\n⚠️ Pre-existing value overwritten"]
    B -->|Simple assignment| E["NVIDIA_VISIBLE_DEVICES = all"]
    B -->|Simple assignment| F["NVIDIA_DRIVER_CAPABILITIES = compute,utility"]
    C -->|If interpolation unsupported| G["❌ PATH broken — only nvidia/bin visible"]
    C -->|If interpolation supported| H["✅ nvidia/bin prepended to original PATH"]
    D --> I["NVIDIA libs findable by linker\n⚠️ Other LD paths dropped"]
    E --> J["Container Runtime exposes all GPUs"]
    F --> K["CUDA compute + utility APIs mounted"]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[package.json block-software envVars] --> B{Runtime applies envVars}
    B -->|Literal string copy| C["PATH = /usr/local/nvidia/bin:\${PATH}\n⚠️ \${PATH} not expanded"]
    B -->|Literal string copy| D["LD_LIBRARY_PATH = /usr/local/nvidia/lib64:/usr/local/nvidia/lib\n⚠️ Pre-existing value overwritten"]
    B -->|Simple assignment| E["NVIDIA_VISIBLE_DEVICES = all"]
    B -->|Simple assignment| F["NVIDIA_DRIVER_CAPABILITIES = compute,utility"]
    C -->|If interpolation unsupported| G["❌ PATH broken — only nvidia/bin visible"]
    C -->|If interpolation supported| H["✅ nvidia/bin prepended to original PATH"]
    D --> I["NVIDIA libs findable by linker\n⚠️ Other LD paths dropped"]
    E --> J["Container Runtime exposes all GPUs"]
    F --> K["CUDA compute + utility APIs mounted"]
Loading

Fix All in Claude Code

Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
python-3.12.10-rapids/package.json:29
**`${PATH}` expansion not guaranteed in JSON `envVars`**

Every other `envVars` entry in this repo is a plain `KEY=VALUE` literal with no variable references. If the Platforma runtime sets these strings directly (e.g. via `setenv`) rather than passing them through a shell, `PATH` will be set to the literal string `/usr/local/nvidia/bin:${PATH}` — with the unexpanded text `${PATH}` as the suffix. That would break resolution of every binary (including `python` itself) that lives outside `/usr/local/nvidia/bin`, making the entire run environment non-functional. Please confirm that the runtime performs `${...}` interpolation before applying `envVars`, or replace this entry with a fully resolved static path.

### Issue 2 of 3
python-3.12.10-rapids/package.json:28-29
**NVIDIA runtime vars applied to all platforms including macOS and Windows**

`NVIDIA_VISIBLE_DEVICES`, `NVIDIA_DRIVER_CAPABILITIES`, and the `/usr/local/nvidia/...` paths are NVIDIA Container Runtime constructs that are only meaningful in Linux containers with GPU passthrough. The `roots` in this package include `macosx-x64`, `macosx-aarch64`, and `windows-x64`. On those platforms the paths won't exist and `LD_LIBRARY_PATH` will point to directories that are absent, potentially interfering with any loader that respects `LD_LIBRARY_PATH` on macOS (`DYLD_LIBRARY_PATH` is the macOS equivalent). If RAPIDS GPU features are explicitly unsupported on non-Linux platforms, consider conditionally applying these vars only for Linux targets (if the runtime supports platform-specific env vars), or document that the NVIDIA vars are no-ops on non-Linux platforms.

### Issue 3 of 3
python-3.12.10-rapids/package.json:28
**`LD_LIBRARY_PATH` silently overwrites any pre-existing value**

Unlike `PATH`, which at least attempts to incorporate the existing value via `${PATH}`, `LD_LIBRARY_PATH` is set to a fixed string. If the host container (or any previously applied runenv layer) already exports `LD_LIBRARY_PATH` entries (e.g. CUDA toolkit, cuDNN), those entries will be silently dropped, potentially causing runtime linker failures for libraries that depend on them. Using the same `${LD_LIBRARY_PATH}` pattern keeps existing paths intact — assuming the runtime supports interpolation.

```suggestion
              "LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/usr/local/nvidia/lib:${LD_LIBRARY_PATH}",
```

Reviews (1): Last reviewed commit: "MILAB-6382: expose host NVIDIA driver vi..." | Re-trigger Greptile

Greptile also left 3 inline comments on this PR.

@blackcat blackcat merged commit 9eb0073 into main Jun 17, 2026
2 checks passed

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds NVIDIA-related environment variables to the configuration of the python-3.12.10-rapids package. A critical issue was raised regarding the use of ${PATH} in the static JSON configuration, as it may not be expanded by the runtime executor and could break executable lookups.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

"NVIDIA_VISIBLE_DEVICES=all",
"NVIDIA_DRIVER_CAPABILITIES=compute,utility",
"LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/usr/local/nvidia/lib",
"PATH=/usr/local/nvidia/bin:${PATH}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using ${PATH} in a static JSON configuration file like package.json will likely not be expanded by the platform's runtime executor, resulting in a literal ${PATH} string in the environment variable. This will break standard executable lookups because the original path directories will be lost. If the platform runner does not support shell-style variable expansion for envVars, consider if this prepending is necessary or if the platform provides an alternative mechanism to append/prepend to the PATH variable.

"NVIDIA_VISIBLE_DEVICES=all",
"NVIDIA_DRIVER_CAPABILITIES=compute,utility",
"LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/usr/local/nvidia/lib",
"PATH=/usr/local/nvidia/bin:${PATH}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 ${PATH} expansion not guaranteed in JSON envVars

Every other envVars entry in this repo is a plain KEY=VALUE literal with no variable references. If the Platforma runtime sets these strings directly (e.g. via setenv) rather than passing them through a shell, PATH will be set to the literal string /usr/local/nvidia/bin:${PATH} — with the unexpanded text ${PATH} as the suffix. That would break resolution of every binary (including python itself) that lives outside /usr/local/nvidia/bin, making the entire run environment non-functional. Please confirm that the runtime performs ${...} interpolation before applying envVars, or replace this entry with a fully resolved static path.

Prompt To Fix With AI
This is a comment left during a code review.
Path: python-3.12.10-rapids/package.json
Line: 29

Comment:
**`${PATH}` expansion not guaranteed in JSON `envVars`**

Every other `envVars` entry in this repo is a plain `KEY=VALUE` literal with no variable references. If the Platforma runtime sets these strings directly (e.g. via `setenv`) rather than passing them through a shell, `PATH` will be set to the literal string `/usr/local/nvidia/bin:${PATH}` — with the unexpanded text `${PATH}` as the suffix. That would break resolution of every binary (including `python` itself) that lives outside `/usr/local/nvidia/bin`, making the entire run environment non-functional. Please confirm that the runtime performs `${...}` interpolation before applying `envVars`, or replace this entry with a fully resolved static path.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Comment on lines +28 to +29
"LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/usr/local/nvidia/lib",
"PATH=/usr/local/nvidia/bin:${PATH}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 NVIDIA runtime vars applied to all platforms including macOS and Windows

NVIDIA_VISIBLE_DEVICES, NVIDIA_DRIVER_CAPABILITIES, and the /usr/local/nvidia/... paths are NVIDIA Container Runtime constructs that are only meaningful in Linux containers with GPU passthrough. The roots in this package include macosx-x64, macosx-aarch64, and windows-x64. On those platforms the paths won't exist and LD_LIBRARY_PATH will point to directories that are absent, potentially interfering with any loader that respects LD_LIBRARY_PATH on macOS (DYLD_LIBRARY_PATH is the macOS equivalent). If RAPIDS GPU features are explicitly unsupported on non-Linux platforms, consider conditionally applying these vars only for Linux targets (if the runtime supports platform-specific env vars), or document that the NVIDIA vars are no-ops on non-Linux platforms.

Prompt To Fix With AI
This is a comment left during a code review.
Path: python-3.12.10-rapids/package.json
Line: 28-29

Comment:
**NVIDIA runtime vars applied to all platforms including macOS and Windows**

`NVIDIA_VISIBLE_DEVICES`, `NVIDIA_DRIVER_CAPABILITIES`, and the `/usr/local/nvidia/...` paths are NVIDIA Container Runtime constructs that are only meaningful in Linux containers with GPU passthrough. The `roots` in this package include `macosx-x64`, `macosx-aarch64`, and `windows-x64`. On those platforms the paths won't exist and `LD_LIBRARY_PATH` will point to directories that are absent, potentially interfering with any loader that respects `LD_LIBRARY_PATH` on macOS (`DYLD_LIBRARY_PATH` is the macOS equivalent). If RAPIDS GPU features are explicitly unsupported on non-Linux platforms, consider conditionally applying these vars only for Linux targets (if the runtime supports platform-specific env vars), or document that the NVIDIA vars are no-ops on non-Linux platforms.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

"RPY2_CFFI_MODE=ABI",
"NVIDIA_VISIBLE_DEVICES=all",
"NVIDIA_DRIVER_CAPABILITIES=compute,utility",
"LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/usr/local/nvidia/lib",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 LD_LIBRARY_PATH silently overwrites any pre-existing value

Unlike PATH, which at least attempts to incorporate the existing value via ${PATH}, LD_LIBRARY_PATH is set to a fixed string. If the host container (or any previously applied runenv layer) already exports LD_LIBRARY_PATH entries (e.g. CUDA toolkit, cuDNN), those entries will be silently dropped, potentially causing runtime linker failures for libraries that depend on them. Using the same ${LD_LIBRARY_PATH} pattern keeps existing paths intact — assuming the runtime supports interpolation.

Suggested change
"LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/usr/local/nvidia/lib",
"LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/usr/local/nvidia/lib:${LD_LIBRARY_PATH}",
Prompt To Fix With AI
This is a comment left during a code review.
Path: python-3.12.10-rapids/package.json
Line: 28

Comment:
**`LD_LIBRARY_PATH` silently overwrites any pre-existing value**

Unlike `PATH`, which at least attempts to incorporate the existing value via `${PATH}`, `LD_LIBRARY_PATH` is set to a fixed string. If the host container (or any previously applied runenv layer) already exports `LD_LIBRARY_PATH` entries (e.g. CUDA toolkit, cuDNN), those entries will be silently dropped, potentially causing runtime linker failures for libraries that depend on them. Using the same `${LD_LIBRARY_PATH}` pattern keeps existing paths intact — assuming the runtime supports interpolation.

```suggestion
              "LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/usr/local/nvidia/lib:${LD_LIBRARY_PATH}",
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants