feat: add AMD GPU (amdflang/OpenMP offload) container#1422
feat: add AMD GPU (amdflang/OpenMP offload) container#1422sbryngelson wants to merge 7 commits intoMFlowCode:masterfrom
Conversation
…h timeout The CPU container job was using QEMU to cross-compile linux/arm64 on a single x86 runner, consistently hitting the 6-hour GitHub Actions limit. All recent releases (v5.1.3 through v5.3.1) failed to publish latest-cpu. Fix: split into two native jobs (ubuntu-22.04 and ubuntu-22.04-arm), mirroring the existing GPU build pattern. Remove QEMU. Merge into a multi-arch manifest in the manifests job using buildx imagetools. Also: add weekly schedule trigger (Sunday midnight UTC) so the devcontainer image stays fresh between releases, and bump build-push-action to v6.
- Dockerfile: add TARGET=amd branch — downloads AFAR drop from repo.radeon.com, installs cmake 3.28 (3.22 doesn't recognise LLVMFlang), builds MPICH 3.4.3 with amdflang so mpi.mod is compiler-compatible; runtime libs libnuma1/libdrm2 added so only --rocm is needed at apptainer runtime - docker.yml: add amd matrix entry + build/push/manifest steps; fix cpu to run natively on amd64/arm64 instead of QEMU cross-build; add weekly nightly cron - CMakeLists.txt: make Cray-specific MPI/hipfft paths conditional on CRAY_MPICH_INC/CRAY_HIPFORT_LIB being set; fall back to standard find_package(MPI) and find_library(hipfft/amdhip64) so the self-contained container image works without any OLCF env vars loaded - toolchain: add amd90a cluster profile (HPCFund gfx90a / MI250); fix module variable export loop so vars that reference previously exported vars expand correctly
|
Claude Code Review Head SHA: 1557fc4 Files changed:
Findings: 1.
ENV OLCF_AFAR_ROOT=/opt/${AFAR_VERSION}When the 2.
- find_package(hipfort COMPONENTS hip CONFIG REQUIRED)
- target_link_libraries(${a_target} PRIVATE hipfort::hip hipfort::hipfort-amdgcn flang_rt.hostdevice)The post-change Cray block (context lines 704–705) now links only |
ENV OLCF_AFAR_ROOT=/opt/${AFAR_VERSION} expanded to /opt/ in cpu/gpu
images because those builds supply no AFAR_VERSION. Introduce a
dedicated OLCF_AFAR_ROOT build-arg (default "") so cpu/gpu images get
an empty var and only the AMD build passes the real path.
Summary
TARGET=amdbranch — downloads AFAR drop (rocm-afar-8873-drop-22.2.0) fromrepo.radeon.com, installs cmake 3.28 (Ubuntu 22.04 ships 3.22 which doesn't recogniseLLVMFlang), builds MPICH 3.4.3 withamdflangas the Fortran compiler sompi.modis compiler-compatible, and includeslibnuma1/libdrm2/libdrm-amdgpu1so only--rocmis needed at Apptainer runtimeamdmatrix entry with full build/push/manifest steps;$TAG-amdmanifest always,latest-amdon release onlyCRAY_MPICH_INC/CRAY_HIPFORT_LIBbeing set; falls back to standardfind_package(MPI)andfind_library(hipfft/amdhip64)with$OLCF_AFAR_ROOThints so the self-contained container works without any OLCF env vars loadedamd90acluster profile (HPCFund gfx90a / MI250); fixes module variable export loop so vars that reference previously exported vars expand in the right orderValidation
mfc-amd-final.sif(Apptainer) on top of Ubuntu 22.04 + AFAR + cmake 3.28 + MPICH 3.4.3apptainer exec --writable-tmpfs --rocmTest plan
TARGET=amd(compile + dry-run, no GPU runner needed in CI)cpuandgpubuilds unaffectedlatest-amdmanifest pushed only on release trigger