Skip to content

Conversation

@ScSteffen
Copy link
Contributor

PR: CUDA Backend for SN-HPC Solver + Singularity GPU Workflow + CPU/CUDA QoI Validation

Summary

This PR adds an optional CUDA backend for the SN-HPC solver and wires it into build, runtime dispatch, validation, and container workflows. The core goal is to run HPC SN lattice/hohlraum cases on a GPU while preserving existing CPU behavior when CUDA is not enabled or no GPU is visible.

By default, behavior is unchanged: CUDA is opt-in via -DBUILD_CUDA_HPC=ON, and non-CUDA builds continue to use the existing CPU/OpenMP path.

What Changed

Build and Configuration

  • Added new CMake option BUILD_CUDA_HPC in CMakeLists.txt.
  • Enabled CUDA language/toolchain setup when BUILD_CUDA_HPC=ON.
  • Added CUDA runtime linking (cudart) and CUDA-specific compile settings.
  • Included CUDA solver sources in build graph:
    • src/solvers/snsolver_hpc.cu
    • include/solvers/snsolver_hpc_cuda.hpp

Runtime Solver Dispatch

  • Updated src/main.cpp so HPC runs choose backend at runtime:
    • If CUDA is compiled in and cudaGetDeviceCount(...) > 0, run SNSolverHPCCUDA.
    • Otherwise fall back to existing SNSolverHPC.

CUDA Solver Implementation

  • Added new solver class in:
    • include/solvers/snsolver_hpc_cuda.hpp
    • src/solvers/snsolver_hpc.cu
  • Implemented CUDA kernels for:
    • first-order flux update
    • second-order slope/limiter/flux update
    • finite-volume update
    • RK2 averaging and scalar flux recomputation
  • Added GPU lifecycle and transfer flow:
    • init device and allocate buffers
    • upload static/state data
    • run kernels
    • download state
    • free resources

CPU Solver Consistency Updates

  • Updated include/solvers/snsolver_hpc.hpp and src/solvers/snsolver_hpc.cpp:
    • added _scalarFluxPrevIter
    • moved Mass and RMS_flux computation to postprocessing based on synchronized scalar flux
  • This aligns CPU/CUDA QoI comparison behavior and reduces ordering-related discrepancies.

Validation Coverage

  • Added CUDA validation block in tests/test_cases.cpp:
    • SN_SOLVER_HPC_CUDA_QOI_VALIDATION
  • New CPU/CUDA comparison configs in tests/input/validation_tests/SN_solver_hpc/:
    • lattice_hpc_200_cpu_order1.cfg
    • lattice_hpc_200_cpu_order2.cfg
    • lattice_hpc_200_cuda_order1.cfg
    • lattice_hpc_200_cuda_order2.cfg
    • symmetric_hohlraum_hpc_200_cpu_order2.cfg
    • symmetric_hohlraum_hpc_200_cuda_order2.cfg
  • Updated reference CSV baselines:
    • lattice_hpc_200_csv_reference
    • symmetric_hohlraum_hpc_200_csv_reference

Container and Developer Workflow

  • Added CUDA-capable Singularity definition:
    • tools/singularity/kit_rt_MPI_cuda.def
  • Added CUDA interactive run helper:
    • tools/singularity/singularity_run_interactive_cuda.sh
  • Updated container build/install helper scripts:
    • tools/singularity/build_container.sh
    • tools/singularity/install_kitrt_singularity.sh
  • README now documents separate install/run flows for:
    • CPU (OpenMP)
    • CPU (OpenMP + MPI)
    • CPU + single GPU (OpenMP + CUDA)
    • plus debug and coverage builds

Why This Is Needed

The SN-HPC path had no native in-repo GPU backend. This PR introduces a practical single-GPU CUDA implementation and connects it to reproducible build and execution paths, including containerized workflows and automated CPU-vs-CUDA QoI comparison tests.

Behavior and Compatibility

  • Default behavior unchanged unless BUILD_CUDA_HPC=ON.
  • When CUDA is enabled but no GPU is visible, runtime falls back to CPU HPC solver.
  • Existing CPU/MPI flows remain available.

Validation Strategy

  • Added automated CPU-vs-CUDA CSV QoI comparisons for lattice order-1 and order-2 cases.
  • Refreshed HPC CSV references to reflect revised mass/RMS computation ordering.
  • Added explicit Singularity --nv CUDA run flow in docs/scripts.

Known Limitations and Follow-up Work

  • CUDA backend is currently single-GPU and pins to device 0.
  • Boundary artifacts are still noted in lattice runs and need follow-up.
  • Current implementation prioritizes correctness and parity; deeper performance tuning is out of scope for this PR.

Reviewer Guide

Suggested review order:

  1. CMakeLists.txt
  2. src/main.cpp
  3. src/solvers/snsolver_hpc.cu
  4. src/solvers/snsolver_hpc.cpp
  5. tests/test_cases.cpp
  6. tools/singularity/kit_rt_MPI_cuda.def
  7. README.md

@ScSteffen ScSteffen self-assigned this Feb 12, 2026
@ScSteffen ScSteffen added the enhancement New feature or request label Feb 12, 2026
@ScSteffen ScSteffen merged commit a65ac29 into master Feb 12, 2026
1 check passed
@ScSteffen ScSteffen deleted the cuda_sn_solver branch February 12, 2026 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant