show expected and problematic output produced by deviceQuery in GPU docs#139
show expected and problematic output produced by deviceQuery in GPU docs#139boegel wants to merge 1 commit intoEESSI:mainfrom
Conversation
| ... | ||
| ``` | ||
|
|
||
| If the `deviceQuery` command can not access your GPU, you will see an error message like: |
There was a problem hiding this comment.
This shouldn't actually happen though, because of the Lmod guards the only scenario I can see where you would reach this is where you are using a container and the system drivers are too old
There was a problem hiding this comment.
I triggered it by cleaning out the host_injections directory after loading the module.
I agree it's very unlikely that it happens, but we should mention it in the docs regardless, if only to let people easily find this page when searching for error messages.
There was a problem hiding this comment.
My concern here is that the placement here makes it seem like it not working is likely, but reaching this message is actually very unlikely
There was a problem hiding this comment.
Maybe a little box saying What does it look like if the command fails?
| @@ -152,10 +152,32 @@ The only scenario where this would be required is if `$LD_LIBRARY_PATH` is modif | |||
|
|
|||
| ### Testing the GPU support {: #gpu_cuda_testing } | |||
There was a problem hiding this comment.
Currently, this only treats testing if you can run CUDA-enabled software from EESSI. Maybe we can also include a small instruction for testing if building new CUDA software on top of EESSI works properly. Something like this:
First, create a file hello_cuda.cu with the contents
#include <stdio.h>
__global__ void helloCUDA()
{
printf("Hello, CUDA!\n");
}
int main()
{
helloCUDA<<<1, 1>>>();
cudaDeviceSynchronize();
return 0;
}
Then
module load CUDA/<some_version>
nvcc -o hello_cuda.cu -o hello_cuda
chmod u+x hello_cuda
./hello_cuda
There was a problem hiding this comment.
And mention they should test this for each version of CUDA they installed in host_injections
There was a problem hiding this comment.
Makes sense, but that should be done in a separate PR?
There was a problem hiding this comment.
If you want, sure. I won't block this one over it :) Although I would consider it to be an integral part of "Testing the GPU support" to be honest :)
There was a problem hiding this comment.
I don't see it as so integral if we are focused on software consumers, it's only integral if you want to do development-type work
| If the `deviceQuery` command can not access your GPU, you will see an error message like: | ||
| ``` | ||
| cudaGetDeviceCount returned 35 | ||
| -> CUDA driver version is insufficient for CUDA runtime version | ||
| Result = FAIL | ||
| ``` | ||
| ``` |
There was a problem hiding this comment.
| If the `deviceQuery` command can not access your GPU, you will see an error message like: | |
| ``` | |
| cudaGetDeviceCount returned 35 | |
| -> CUDA driver version is insufficient for CUDA runtime version | |
| Result = FAIL | |
| ``` | |
| ``` | |
| !!! note "What if the `deviceQuery` command fails?" | |
| If the `deviceQuery` command cannot access your GPU, you will see an error message like: | |
| ``` | |
| cudaGetDeviceCount returned 35 | |
| -> CUDA driver version is insufficient for CUDA runtime version | |
| Result = FAIL | |
| ``` | |
showing output in case it doesn't work is useful for searching purposes...