-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Motivation
Determinant calculations dominate performance in downstream consumers (e.g. Delaunay flip/repair algorithms). These are safe-code-only changes that should meaningfully reduce latency for small fixed-size matrices.
Proposed Changes
1. Closed-form det for small dimensions
Add specialized paths in Matrix::det (or a new det_fast) for D=1, 2, 3, and 4 that bypass LU factorization entirely:
- 1×1: return
a[0][0] - 2×2:
ad - bcvia a singlemul_add - 3×3: Sarrus rule (6
mul_addterms) - 4×4: Laplace/cofactor expansion on first row (reduces to four 3×3 sub-determinants)
Expected: 3–5× speedup for these common dimensions.
2. Combine non-finite check with pivot search in lu.rs
The current pivot search (lines ~30–46) makes two conceptual passes: scanning for max-abs and separately checking is_finite. Merge them into a single loop body so each entry is touched once.
3. FMA consistency in the elimination loop
lu.rs already uses mul_add in the inner elimination loop but not uniformly across all arithmetic. Audit all multiply-then-add patterns in hot loops and apply mul_add consistently.
4. Replace get/set with direct indexing in hot paths
Lu::factor and Lu::solve_vec call get/set through wrappers that do bounds checks on every access. The indices are in-bounds by construction in those loops — use self.rows[r][c] directly and let the compiler elide the redundant checks.
5. Release profile tuning
Add to Cargo.toml:
[profile.release]
lto = "fat"
codegen-units = 1This enables whole-crate inlining and lets LLVM specialize/merge const-generic monomorphizations more aggressively.
Acceptance Criteria
- All existing tests pass (
just test) - Benchmarks show improvement for
d2–d5determinant cases (cargo bench) - No
unsafecode introduced