refactor: consolidate grid, stretch, and body force params into derived types#1432
Open
sbryngelson wants to merge 41 commits into
Open
refactor: consolidate grid, stretch, and body force params into derived types#1432sbryngelson wants to merge 41 commits into
sbryngelson wants to merge 41 commits into
Conversation
Contributor
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1432 +/- ##
==========================================
- Coverage 61.31% 61.07% -0.24%
==========================================
Files 72 72
Lines 19771 19650 -121
Branches 2849 2856 +7
==========================================
- Hits 12123 12002 -121
- Misses 5699 5707 +8
+ Partials 1949 1941 -8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Replace flat allocatable arrays x_cb/y_cb/z_cb, x_cc/y_cc/z_cc,
and dx/dy/dz with a derived type having .cb, .cc, and .spacing
components. All three executables (pre_process, simulation,
post_process) updated across 47 files.
Key design decisions:
- pre_process keeps scalar dx/dy/dz as minimum cell-width scalars;
only x_cb and x_cc are folded into x%cb and x%cc
- OpenMP GPU target uses whole-struct declare target (x, y, z) since
component-level declare target is invalid; OpenACC uses component-level
- 2dHardcodedIC.fpp wraps dx*dy in #ifdef MFC_PRE_PROCESS for the
scalar vs per-cell context difference
Special variable collisions fixed:
- m_chemistry.fpp: local integer x/y/z -> cx/cy/cz
- m_weno.fpp: local real y(1:4) scratch -> ys
- m_viscous.fpp: local real dx(1:3) scratch -> ds
- m_ibm.fpp: local scalar dx/dy/dz -> dx_loc/dy_loc/dz_loc
- m_cbc.fpp: Fypp template d${XYZ}$ -> ${XYZ}$%spacing
Implements items 7 and 5 from issue #1427: - x_a/x_b/y_a/y_b/z_a/z_b -> type(bounds_info) :: x_stretch, y_stretch, z_stretch - bf_x/bf_y/bf_z + k_x/w_x/p_x/g_x (and y/z) -> type(body_force_axis) :: bf_x, bf_y, bf_z
882206f to
3f9f382
Compare
15f3a64 to
eb90ba8
Compare
Remove module-level dx, dy, dz scalars from pre_process m_global_parameters. Add min_spacing field to the grid_axis derived type so each axis carries its own minimum cell width. Update all call sites in m_grid, m_start_up, m_icpp_patches, m_mpi_common, and 2dHardcodedIC.
After reading grid data from files, compute and store min_spacing on each axis in both serial and parallel paths. Matches the pre_process pattern so min_spacing is consistent across all three executables.
Group the three directional boundary condition variables (bc_x, bc_y, bc_z of type bc_dir_t) into a single bc_xyz_info struct accessed as bc%x, bc%y, bc%z. Updates all Fortran source, Fypp macros, Python toolchain, example cases, and documentation.
PR #1432 renamed bc_x%beg -> bc%x%beg. The remove_higher_dimensional_keys helper only matched the old _y/_z separator style (.+_y, y_.+), so bc%y%beg and bc%z%beg were not removed for lower-dimensional cases. Add %{dim}% substring check to cover the new compound key format.
- m_thinc.fpp: take master's extended Fypp for-loop tuple (STENCIL_VAR, COORDS, X_BND/Y_BND/Z_BND), update CC_PRI x_cc/y_cc/z_cc -> x%cc/y%cc/z%cc - m_rhs.fpp: take master's drop of 'dummy' workaround condition, keep bc%y%beg naming - m_riemann_solvers.fpp: take master's unified Re_avg_rsx_vf indexing (j,k,l) for all cylindrical faces, update y_cb/y_cc -> y%cb/y%cc
wilfonba
reviewed
May 13, 2026
…at _in/_out fields
Group the three body-force axis structs (bf_x, bf_y, bf_z) into a single body_force_t container variable bf, matching the bc%x/y/z compound naming pattern. Updates all Fortran source, Python toolchain, examples, and docs.
- examples/3D_rayleigh_taylor/case.py: bf_y%* -> bf%y%*
- case_validator.py: check_body_forces uses bf%{dir}%enabled/k/w/p/g
- descriptions.py: regex pattern bf_([xyz]) -> bf%([xyz])%enabled
Replace flat rho_L/rho_R, pres_L/pres_R, E_L/E_R, H_L/H_R, gamma_L/R, pi_inf_L/R, qv_L/R, c_L/R, T_L/R, MW_L/R, R_gas_L/R, Cp_L/R, Cv_L/R, Gamm_L/R, G_L/R, Y_L/R, vel_L_rms/R_rms, vel_L_tmp/R_tmp, nbub_L/R, ptilde_L/R with type(riemann_states) :: rho, pres, E, H, ... accessed as rho%L / rho%R. Replace vel_L/vel_R with type(riemann_states_vec3) :: vel accessed as vel%L(i) / vel%R(i). HLLD already used this pattern; HLL, LF, and HLLC now match. GPU private lists updated to name the struct once instead of _L and _R.
Cray ftn (CCE) requires globals accessed in GPU accelerator routines to have ! declare directives. bc (type bc_xyz_info) was declared for OpenMP but omitted from the OpenACC block, causing: ftn-7066 ftn: ERROR S_SLIP_WALL: Global in accelerator routine without declare -- bc Add $:GPU_DECLARE(create='[bc]') to the MFC_OpenACC branch alongside the existing OpenMP declaration.
…states structs Extend the riemann_states refactor to cover wave speeds, intermediate states, stress tensor, Reynolds numbers, and HLLD state vectors: s_L/R -> s%L/R (type riemann_states) Ms_L/R -> Ms%L/R pres_SL/SR -> pres_S%L/R flux_tau_L/R -> flux_tau%L/R xi_L/R -> xi%L/R (HLLC) xi_L/R_m1 -> xi_m1%L/R (HLLC) pTot_L/R -> pTot%L/R (HLLD) rhoL/R_star -> rho_star%L/R (HLLD) E_starL/R -> E_star%L/R (HLLD) vL/R_star -> v_star%L/R (HLLD) wL/R_star -> w_star%L/R (HLLD) tau_e_L/R(i) -> tau_e%L/R(i) (type riemann_states_arr6) Re_L/R(i) -> Re%L/R(i) (type riemann_states_arr2) xi_field_L/R -> xi_field%L/R (riemann_states_vec3) U_L/R etc. -> U%L/R etc. (type riemann_states_arr7, HLLD) Variable-size arrays (alpha_L/R, Ys_L/R, etc.) are left as-is since they require allocatable components or Fypp-conditional sizing. Also exclude fp-stability-logs/ and scripts/ from typos spell check.
type(riemann_states) structs are not auto-private under OpenACC (unlike plain real(wp) scalars). flux_tau_L/flux_tau_R were scalars on master and worked without explicit listing; after the riemann_states refactor they became flux_tau (a derived type) which requires explicit private=[flux_tau]. HLLC had it; LF was missed.
This file is auto-generated by the test runner to track failures for CI retry logic. It should never be committed.
This was referenced May 15, 2026
Open
…MP declare target compatibility
…t struct Replace 6 module-level vector_field arrays (dqL/R_prim_dx/dy/dz_n) with 2 dq_prim_dir_t struct instances (dqL_prim_n, dqR_prim_n) whose %x/%y/%z members hold the same data. Subroutine signatures continue to receive individual allocatable arrays to avoid GPU illegal-address errors that arise when a non-allocatable struct is passed as a dummy arg to kernels.
…ine struct member access OpenMP GPU device subroutines cannot reliably access grid_axis struct member allocatables (x%spacing, x%cc, x%cb) through the device descriptor. This caused A57E30FE (3D Viscous IGR Jacobi) to fail with CUDA_ERROR_ILLEGAL_ADDRESS. Add flat allocatable arrays dx/dy/dz, x_cc/y_cc/z_cc, x_cb/y_cb/z_cb as GPU-accessible aliases in m_global_parameters.fpp. Replace all struct member accesses in simulation GPU kernel files with these flat arrays. Two sync points are required: 1. Early HOST sync in s_initialize_modules after s_populate_grid_variables_buffers, before WENO/IGR/CBC/Riemann module initializations that use grid values (e.g. s_initialize_weno_module uses s_cb => x_cb for WENO polynomial coefficients) 2. GPU_UPDATE sync in s_initialize_gpu_vars to copy flat arrays to device Regenerate 16 IGR golden files for nvfortran OpenMP GPU build on wingtip-gpu3.
…e access Change grid_axis cb/cc/spacing from allocatable to pointer components backed by the flat module arrays (x_cb/x_cc/dx etc.). GPU pointer attachment via GPU_ENTER_DATA(attach=) updates the device struct's pointer fields to point to the already-mapped device flat arrays, fixing CUDA_ERROR_ILLEGAL_ADDRESS in m_igr.fpp inline GPU_PARALLEL_LOOP bodies on NVHPC OpenMP target offload. Eliminates the early host sync, the duplicate GPU_UPDATE for struct members, and the OpenACC/OpenMP split in GPU_DECLARE for x/y/z.
Replace flat grid array references (dx/dy/dz, x_cc/y_cc/z_cc, x_cb/y_cb/z_cb) with struct member access (x%spacing/y%spacing/z%spacing, x%cc/y%cc/z%cc, x%cb/y%cb/z%cb) throughout simulation kernel files. m_igr.fpp retains flat array access: NVHPC OpenMP target offload does not correctly resolve declare-target struct pointer components in that file's inline GPU_PARALLEL_LOOP bodies, causing CUDA_ERROR_LAUNCH_FAILED.
…inter-in-atomic-region bug NVHPC cannot correctly dereference declare-target struct pointer components (x%spacing, y%spacing, z%spacing) inside GPU_ATOMIC blocks in m_igr.fpp. Introduce module-level inv_dx/inv_dy/inv_dz arrays precomputed from the flat spacing arrays and GPU-updated once at init. All GPU_ATOMIC and GPU_PARALLEL_LOOP bodies in this file now use inv_d*(j/k/l) instead of 1._wp/d*(j/k/l), eliminating the pointer indirection that triggers the compiler bug and also removing repeated divisions from the hot atomic path. CPU-only alf_igr computation updated to x%spacing(1) / y%spacing(1) / z%spacing(1) as the struct-member access works correctly on the host.
Gamm_L/Gamm_R were consolidated into type(riemann_states) :: Gamm when the riemann_states refactor landed. The HLLC private list was updated but the LF solver (s_lf_riemann_solver) private list was missed, causing ACC find_in_present_table failures for Gamm on Cray OpenACC (Frontier).
…er dereference x%cc/x%spacing/etc. inside GPU_PARALLEL_LOOP fail on NVHPC and AMD OpenMP target because map(always,to: x%cc) does not correctly update the device struct pointer to point to device data, causing CUDA_ERROR_ILLEGAL_ADDRESS (Phoenix gpu-omp UUID AA49A8BC) and HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION (Frontier AMD gpu-omp UUID 2ADA983F). Revert to flat declare-target arrays (dx/dy/dz, x_cc/y_cc/z_cc) which are directly resolved on device.
…ct pointer dereference" This reverts commit e651b8f.
…ran pointer attachment map(always,to/from:) for pointer components copies the host descriptor (with host addresses) to device, leaving device struct pointers invalid. OpenMP 5.1 map(attach:) correctly looks up the device address of the pointee and updates the device struct pointer to reference device memory. map(detach:) is the symmetric reverse. This fixes CUDA_ERROR_ILLEGAL_ADDRESS (Phoenix gpu-omp) and HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION (Frontier AMD gpu-omp) caused by x%cc/x%spacing etc. being dereferenced as host addresses inside GPU kernels.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Consolidates several families of flat scalar parameters and module-level arrays into Fortran derived types, reducing global variable count and making structure explicit. Implements items 1, 4, 5, 6, and 7 from issue #1427, plus item 2 (CBC directional triplets, MUSCL bounds) and item 3 (L/R Riemann state triplets).
Grid coordinate arrays →
type(grid_axis)(item 1)x_cc,x_cb,x_cb_s(and y/z equivalents) →x%cc,x%cb,x%cb_sdx,dy,dzmin spacing →x%min_spacing,y%min_spacing,z%min_spacingx%cc(i)instead ofx_cc(i)Boundary condition types →
bc_dir_t+bc_side_t(item 4)bc_dir_tto hold BC type (beg/end integers) separately from payloadbc_x%beg/end→bc%x%beg/end(new compoundbc_xyz_infostruct)bc_side_twithbeg_side/end_sidesub-structs, replacing 14 flat_in/_out/vb*/ve*fieldscase.pyfiles and golden test filesremove_higher_dimensional_keysto handlebc%y%beg-style keys for lower-dimensional casesBody force parameters →
type(body_force_t)(item 5)bf_x,bf_y,bf_zvariables with singlebfstructbf_x%k,bf_x%w,bf_x%p,bf_x%g,bf_x%enabled(and y/z) →bf%x%k,bf%x%w, etc.IB dynamics →
type(ib_dynamics_t)(item 6)force_x/y/z,torque_x/y/z,vel_x/y/z,omega_x/y/z,angle_x/y/z→force%x/y/z, etc.Grid stretching →
type(bounds_info)(item 7)x_a/x_b,y_a/y_b,z_a/z_b→x_stretch%beg/end,y_stretch%beg/end,z_stretch%beg/endCBC module directional arrays → private derived types (item 2)
m_cbc.fpp:cbc_rs_dir_t— 4D(:,:,:,:)triplet for reshaped primitive/flux arrayscbc_fd_dir_t— 2D(:,:)triplet for finite-difference coefficientscbc_pi_dir_t— 3D(:,:,:)triplet for polynomial interpolation coefficientsq_prim_rs[x/y/z]_vf→q_prim_rs%[x/y/z]F_rs[x/y/z]_vf,F_src_rs[x/y/z]_vf→F_rs%[x/y/z],F_src_rs%[x/y/z]flux_rs[x/y/z]_vf_l,flux_src_rs[x/y/z]_vf_l→flux_rs%[x/y/z],flux_src_rs%[x/y/z]fd_coef_[x/y/z]→fd_coef%[x/y/z]pi_coef_[x/y/z]→pi_coef%[x/y/z]_lsuffix naming workaround is naturally resolved by the struct namingGPU_DECLAREuses%componentmember syntax, consistent withm_global_parameters.fppMUSCL bounds triplet →
type(muscl_bounds_t)(item 2)is1_muscl,is2_muscl,is3_muscl→is_muscl%x,is_muscl%y,is_muscl%zs_muscldummy argument list reduced from 3int_bounds_infoargs to 3 renamed scalars assigned into one struct;GPU_UPDATEcollapsed from 3 variables to 1L/R Riemann state triplets →
type(riemann_states_arr)(item 3)m_riemann_solvers.fppOther
Refactoring this enables
The derived type consolidations here are load-bearing groundwork for several follow-on simplifications:
Directional sweep loops. With x/y/z triplets in structs, code that currently has three nearly-identical Fypp-generated blocks (one per direction) can be replaced by a single loop over
[x, y, z]components. The MUSCL limiter and CBC sweep logic are the most immediate targets — each currently copies the same logic three times via#:for XYZ in [...].Slimmer subroutine signatures. Every consolidated triplet removes two arguments from subroutines that previously took separate
_x,_y,_zparameters. The remainingdqL/R_prim_dx/dy/dz_ntriplets inm_rhs.fppandm_riemann_solvers.fppare the next candidate — consolidating them collapses a 3-argument pattern repeated across 7 subroutines into 1.Simpler GPU data management. Struct-level
GPU_DECLAREandGPU_UPDATEcalls cover all components in one directive, eliminating the class of bug where one direction is updated but another is not. This is already visible in the MUSCL change:GPU_UPDATE(device='[is_muscl]')replaces three separate updates.Easier extensibility. Adding a new per-direction field (e.g., a new flux component or a new BC attribute) now touches one struct definition instead of 6–10 separate declarations, allocation sites, and call sites. The
bc_side_tstruct is the clearest example — adding a per-face field now requires one line.Foundation for the Riemann solver refactor (#1426). The
riemann_statesconsolidation (item 3) is the prerequisite named in that issue. The struct layout is now in place for the interface overhaul described there.Test plan
./mfc.sh precheckpasses (all 6 checks)remove_higher_dimensional_keysfix verifiedCloses part of #1427.
Closes #1441.
Partially addresses #1440 (CBC and MUSCL directional sweep deduplication).