Skip to content

Fix native library loader silently failing when CUDA DLLs are missing#1555

Open
alinpahontu2912 wants to merge 2 commits intodotnet:mainfrom
alinpahontu2912:fix/cuda-dll-silent-load-failure
Open

Fix native library loader silently failing when CUDA DLLs are missing#1555
alinpahontu2912 wants to merge 2 commits intodotnet:mainfrom
alinpahontu2912:fix/cuda-dll-silent-load-failure

Conversation

@alinpahontu2912
Copy link
Member

Change all 'ok = TryLoadNativeLibraryByName(...)' calls to 'ok &= ...' so that failures accumulate instead of being overwritten by subsequent successful loads. Initialize 'ok = true' before the loading chain.

Previously, each load call overwrote the result of the previous one, so if an early CUDA dependency (e.g. cudnn_adv64_9) failed to load but LibTorchSharp succeeded, 'ok' would be true. This caused:

  • nativeBackendCudaLoaded set to true despite missing dependencies
  • The fallback loading path was skipped
  • The diagnostic trace (StringBuilder) was discarded
  • Subsequent load attempts were skipped entirely
  • CUDA operations failed later with cryptic errors

Now any single load failure keeps 'ok' as false, ensuring the fallback path is attempted and the full diagnostic trace is preserved in error messages.

Fixes #1545

Change all 'ok = TryLoadNativeLibraryByName(...)' calls to 'ok &= ...' so that
failures accumulate instead of being overwritten by subsequent successful loads.
Initialize 'ok = true' before the loading chain.

Previously, each load call overwrote the result of the previous one, so if an
early CUDA dependency (e.g. cudnn_adv64_9) failed to load but LibTorchSharp
succeeded, 'ok' would be true. This caused:
- nativeBackendCudaLoaded set to true despite missing dependencies
- The fallback loading path was skipped
- The diagnostic trace (StringBuilder) was discarded
- Subsequent load attempts were skipped entirely
- CUDA operations failed later with cryptic errors

Now any single load failure keeps 'ok' as false, ensuring the fallback path is
attempted and the full diagnostic trace is preserved in error messages.

Fixes dotnet#1545

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a correctness issue in TorchSharp’s native library loader where a failed CUDA dependency load could be overwritten by later successful loads, causing the loader to incorrectly treat the CUDA backend as initialized and skip the fallback/diagnostics path.

Changes:

  • Initialize ok to true before the native-load attempt chain.
  • Accumulate load results using ok &= TryLoadNativeLibraryByName(...) so any single failure preserves ok == false while still attempting all loads (capturing full diagnostics).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Silent Native Library Failure in torch::loadNativeBackEnd

2 participants