Skip to content

Update NVIDIA driver symlink script#158

Draft
casparvl wants to merge 1 commit intoEESSI:mainfrom
casparvl:link_nvidia_drivers
Draft

Update NVIDIA driver symlink script#158
casparvl wants to merge 1 commit intoEESSI:mainfrom
casparvl:link_nvidia_drivers

Conversation

@casparvl
Copy link
Contributor

@casparvl casparvl commented Feb 4, 2026

We'll need the following variant symlinks to be in place before this script can work as intended:

ln -s '$(EESSI_202506_NVIDIA_OVERRIDE:-/cvmfs/software.eessi.io/defaults/nvidia)' /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/nvidia
ln -s '$(EESSI_202506_NVIDIA_OVERRIDE:-/cvmfs/software.eessi.io/defaults/nvidia)' /cvmfs/software.eessi.io/versions/2025.06/compat/linux/aarch64/lib/nvidia
ln -s '$(EESSI_202506_NVIDIA_OVERRIDE:-/cvmfs/software.eessi.io/defaults/nvidia)' /cvmfs/software.eessi.io/versions/2025.06/compat/linux/riscv64/lib/nvidia

And then:

ln -s '$(EESSI_NVIDIA_OVERRIDE_DEFAULT:-/dev/null)' /cvmfs/software.eessi.io/defaults/nvidia

This can then be quite easily tested from within the container:

./eessi_container.sh -a rw -r software.eessi.io -b $<host-software-layer-scripts>:/software-layer-scripts --nvidia all
cd /software-layer-scripts/scripts/gpu_support/nvidia
./link_nvidia_host_libraries.sh

This should error out stating that the variant symlink resolves to /dev/null. Then, you can change /etc/cvmfs/default.local to set e.g. EESSI_NVIDIA_OVERRIDE_DEFAULT (e.g. to /opt/eessi/nvidia) and run the linking script again - this should the install the symlinks.

@casparvl
Copy link
Contributor Author

casparvl commented Feb 4, 2026

Although we don't have the symlinks yet, I can actually already test this in the container - it will just create the symlinks in /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/nvidia/ in the writeable overlay. That's fine.

What I did:

$ cd /software-layer-scripts/scripts/gpu_support/nvidia/
$ umask 0022
$  source /cvmfs/software.eessi.io/versions/2025.06/init/lmod/bash
# For some reason this failed to load the module - some module cache issue?
$ module load EESSI/2025.06
$ cat > dummy.c <<'EOF'
int main(void) { return 0; }
EOF
$ gcc -Wall -Wl,--no-as-needed -lcuda dummy.c -o dummy -L /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/nvidia/
# singularity has /.singularity.d/libs with the CUDA drivers in the LD_LIBRARY_PATH, but those are not the ones we want to find...
$  unset LD_LIBRARY_PATH
$ ldd dummy
Apptainer> ldd dummy
        linux-vdso.so.1 (0x00007ffc59bb4000)
        libcuda.so.1 => /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/nvidia/libcuda.so.1 (0x000014f19b377000)
...

Works as intended. After implementing the variant symlinks, we should retest, try to use the EESSI_NVIDIA_OVERRIDE_DEFAULT symlink, and, once that works, try again using the EESSI_202506_NVIDIA_OVERRIDE variant symlink.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant