Skip to content

refactor(expm): Adding LinearOperator trait and using that trait to parallelize single vector expm_multiply#46

Merged
weinbe58 merged 3 commits intomainfrom
phil/linear-operator-trait
Apr 13, 2026
Merged

refactor(expm): Adding LinearOperator trait and using that trait to parallelize single vector expm_multiply#46
weinbe58 merged 3 commits intomainfrom
phil/linear-operator-trait

Conversation

@weinbe58
Copy link
Copy Markdown
Contributor

No description provided.

weinbe58 and others added 2 commits April 6, 2026 23:00
)

Introduce `LinearOperator<V>` trait with `dot`, `dot_transpose`,
`dot_many`, `dot_chunk`, `dot_transpose_chunk`, `trace`, and `onenorm`
methods. Add `QMatrixOperator` adapter that wraps `&QMatrix<M,I,C>` with
baked-in coefficients.

Refactor `algorithm`, `norm_est`, and `params` to be generic over
`Op: LinearOperator<V>`, removing all `&QMatrix + &[coeffs]` parameters
from the Taylor-series core. Update high-level API functions in `mod.rs`
and `Hamiltonian::expm_dot`/`expm_dot_many` in `ham.rs` to construct a
`QMatrixOperator` at the call site.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the matrix-exponential (expm) subsystem to operate over a new LinearOperator abstraction (instead of being hard-wired to QMatrix + coeffs) and introduces a persistent-thread parallel implementation for single-vector expm_multiply dispatch.

Changes:

  • Introduce LinearOperator + QMatrixOperator adapter and refactor expm parameter selection, norm estimation, and algorithms to use the trait.
  • Add expm_multiply_par and auto-dispatch to the parallel path above a size threshold; add atomic-accumulation utilities to support chunked transpose operations.
  • Update benchmarking script and dependency lockfile.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
uv.lock Updates pinned Python toolchain/dev dependency versions.
scripts/benchmark.py Expands benchmark script usage of new Rust bindings / APIs.
crates/quspin-core/src/hamiltonian/ham.rs Wraps QMatrix + evaluated coeffs in QMatrixOperator before calling expm_*.
crates/quspin-core/src/expm/mod.rs Exposes new operator abstraction and parallel entry points; updates auto APIs to use ArrayViewMut1.
crates/quspin-core/src/expm/linear_operator.rs Adds LinearOperator trait and QMatrixOperator implementation, including chunked methods.
crates/quspin-core/src/expm/compute.rs Adds AtomicAccum and concrete atomic accumulators; extends ExpmComputation with an associated atomic type.
crates/quspin-core/src/expm/algorithm.rs Refactors core algorithms to use LinearOperator and adds persistent-thread parallel expm_multiply_par.
crates/quspin-core/src/expm/params.rs Refactors adaptive parameter selection to operate on LinearOperator; updates tests accordingly.
crates/quspin-core/src/expm/norm_est.rs Refactors 1-norm estimator to use LinearOperator for matvecs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +15 to +19
def _make_operator(L: int) -> PauliOperator:
"""Single-particle XX+YY hopping on L sites with periodic boundary conditions."""
zz_bonds = [[1.0, i, (i + 1) % L] for i in range(L)]
x_bonds = [[1.0, i] for i in range(L)]
return PauliOperator([("zz", zz_bonds)], [("x", x_bonds)])
return PauliOperator([("xx", zz_bonds)], [("z", x_bonds)])
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_make_operator docstring says this builds an "XX+YY hopping" operator, but the returned PauliOperator uses ("xx", zz_bonds) and ("z", x_bonds). This mismatch makes the benchmark misleading and likely constructs the wrong Hamiltonian. Please update the operator terms (and/or variable names) to match the intended XX+YY model.

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +69
ham = Hamiltonian(Q, [Static(), Hx])
input = np.zeros(basis.size, dtype=np.complex128)
print(basis.index("0000000000"))
input[basis.index("0000000000")] = 1.0
output = np.zeros_like(input)
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basis.index("0000000000") hard-codes a 10-site bitstring, but L is set to 4 above, so this will error/panic for most runs. Consider using "0" * L (or the chosen seed) consistently when indexing and initializing the state vector.

Copilot uses AI. Check for mistakes.
Comment on lines +66 to +82
input = np.zeros(basis.size, dtype=np.complex128)
print(basis.index("0000000000"))
input[basis.index("0000000000")] = 1.0
output = np.zeros_like(input)

# TODO:
# * see if we can merge `dot` and `dot_many` into a single python biniding.
# * expose dot_transpose in the same way.
# * Try to make output and overwrite optional
#
ham.dot_many(0.5, input.reshape((-1, 1)), output.reshape((-1, 1)), True)


# TODO: Fix this this interface, should have input and output arrays
a = 0.01j
print(input)
ham.expm_dot(0.0, a, input1 := input.copy())
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input is used as a variable name here, which shadows Python’s built-in input() function. Renaming it (e.g., psi0/state_in) would avoid confusion when extending this script.

Suggested change
input = np.zeros(basis.size, dtype=np.complex128)
print(basis.index("0000000000"))
input[basis.index("0000000000")] = 1.0
output = np.zeros_like(input)
# TODO:
# * see if we can merge `dot` and `dot_many` into a single python biniding.
# * expose dot_transpose in the same way.
# * Try to make output and overwrite optional
#
ham.dot_many(0.5, input.reshape((-1, 1)), output.reshape((-1, 1)), True)
# TODO: Fix this this interface, should have input and output arrays
a = 0.01j
print(input)
ham.expm_dot(0.0, a, input1 := input.copy())
state_in = np.zeros(basis.size, dtype=np.complex128)
print(basis.index("0000000000"))
state_in[basis.index("0000000000")] = 1.0
output = np.zeros_like(state_in)
# TODO:
# * see if we can merge `dot` and `dot_many` into a single python biniding.
# * expose dot_transpose in the same way.
# * Try to make output and overwrite optional
#
ham.dot_many(0.5, state_in.reshape((-1, 1)), output.reshape((-1, 1)), True)
# TODO: Fix this this interface, should have input and output arrays
a = 0.01j
print(state_in)
ham.expm_dot(0.0, a, input1 := state_in.copy())

Copilot uses AI. Check for mistakes.
Comment on lines +329 to +348
// tmp and f are split into per-thread chunks using safe Rust: each thread
// receives exclusive ownership of its contiguous slice (tmp) or strided
// sub-view (f) with no aliasing.
let tmp_chunks: Vec<&mut [V]> = tmp_buf.chunks_mut(chunk).collect();

let mut f_parts: Vec<ArrayViewMut1<'_, V>> = Vec::with_capacity(n_threads);
{
let mut remaining = f;
for _ in 0..n_threads - 1 {
let (lo, hi) = remaining.split_at(Axis(0), chunk);
f_parts.push(lo);
remaining = hi;
}
f_parts.push(remaining);
}

std::thread::scope(|scope| {
for (tid, (tmp_chunk, mut f_chunk)) in tmp_chunks.into_iter().zip(f_parts).enumerate() {
let begin = tid * chunk;
let n_local = tmp_chunk.len(); // rows owned by this thread
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n_threads is used to size the Barrier and per-thread norm buffers, but the actual number of spawned threads is derived from tmp_buf.chunks_mut(chunk) and can be less than n_threads (e.g., n=257, n_threads=128 ⇒ chunk=3 ⇒ only 86 chunks). This will deadlock at the barrier (or panic during split_at when remaining < chunk). Please ensure the chunking logic produces exactly n_threads disjoint ranges (e.g., build explicit (begin,end) ranges for tid in 0..n_threads, with the last range shorter), and spawn exactly n_threads workers to match the Barrier size.

Suggested change
// tmp and f are split into per-thread chunks using safe Rust: each thread
// receives exclusive ownership of its contiguous slice (tmp) or strided
// sub-view (f) with no aliasing.
let tmp_chunks: Vec<&mut [V]> = tmp_buf.chunks_mut(chunk).collect();
let mut f_parts: Vec<ArrayViewMut1<'_, V>> = Vec::with_capacity(n_threads);
{
let mut remaining = f;
for _ in 0..n_threads - 1 {
let (lo, hi) = remaining.split_at(Axis(0), chunk);
f_parts.push(lo);
remaining = hi;
}
f_parts.push(remaining);
}
std::thread::scope(|scope| {
for (tid, (tmp_chunk, mut f_chunk)) in tmp_chunks.into_iter().zip(f_parts).enumerate() {
let begin = tid * chunk;
let n_local = tmp_chunk.len(); // rows owned by this thread
// Build exactly `n_threads` disjoint row ranges so the number of spawned
// workers always matches the Barrier size and per-thread scratch buffers.
let n_rows = tmp_buf.len();
let ranges: Vec<(usize, usize)> = (0..n_threads)
.map(|tid| {
let begin = (tid * chunk).min(n_rows);
let end = (begin + chunk).min(n_rows);
(begin, end)
})
.collect();
// tmp and f are split into exactly the same per-thread partitions using
// safe Rust: each thread receives exclusive ownership of its contiguous
// slice (tmp) or sub-view (f) with no aliasing. Some trailing partitions
// may be empty when `n_threads > n_rows`.
let mut tmp_chunks: Vec<&mut [V]> = Vec::with_capacity(n_threads);
{
let mut remaining = tmp_buf;
for &(begin, end) in &ranges {
let len = end - begin;
let (lo, hi) = remaining.split_at_mut(len);
tmp_chunks.push(lo);
remaining = hi;
}
}
let mut f_parts: Vec<ArrayViewMut1<'_, V>> = Vec::with_capacity(n_threads);
{
let mut remaining = f;
for &(begin, end) in &ranges {
let len = end - begin;
let (lo, hi) = remaining.split_at(Axis(0), len);
f_parts.push(lo);
remaining = hi;
}
}
std::thread::scope(|scope| {
for (tid, (((begin, end), tmp_chunk), mut f_chunk)) in ranges
.into_iter()
.zip(tmp_chunks)
.zip(f_parts)
.enumerate()
{
let n_local = end - begin; // rows owned by this thread

Copilot uses AI. Check for mistakes.
Comment on lines +17 to 22
/// Returns a lower bound on `‖B^p‖_1`.
pub fn onenorm_matrix_power_nnm<V, Op>(op: &Op, a: V, mu: V, p: usize, ell: usize) -> V::Real
where
M: Primitive,
I: Index,
C: CIndex,
V: ExpmComputation,
Op: LinearOperator<V>,
{
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1-norm estimator now runs in the generic scalar type V and returns V::Real. For V=f32 / Complex<f32>, this reduces the estimator precision compared to the previous f64 implementation and can change parameter selection (m*, s) in fragment_3_1. If the intent is to keep norm/parameter selection in full precision regardless of compute dtype, consider making onenorm_matrix_power_nnm (and its internal arithmetic) always operate in f64/Complex<f64> and return f64.

Copilot uses AI. Check for mistakes.
Comment on lines +154 to 162
let est = onenorm_matrix_power_nnm(self.op, self.a, self.mu, p, self.ell);
// Convert V::Real → f64 via round-trip through Complex<f64>.
let est_f64 = V::from_real(est).to_complex().re;
// d(p) = ||B^p||_1^(1/p)
let d_p = if est == 0.0 {
let d_p = if est_f64 == 0.0 {
0.0
} else {
est.powf(1.0 / p as f64)
est_f64.powf(1.0 / p as f64)
};
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LazyNormInfo::d converts the estimator result to f64 via V::from_real(est).to_complex().re. Besides being indirect, this round-trip will quantize when V::Real is f32, further reducing the accuracy of d(p)/alpha(p) and therefore (m*, s) selection. If norm estimation is meant to be done in full precision, returning f64 directly from the estimator would avoid these lossy conversions.

Copilot uses AI. Check for mistakes.
/// Unlike `dot_chunk`, the transpose scatters writes to arbitrary column
/// indices, so different threads processing disjoint row ranges will
/// concurrently write to the same output slots. The output is therefore
/// typed as `&[V::Atomic]` — a shared, atomically-acumulatable array —
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "atomically-acumulatable" should be "atomically-accumulatable" (or "atomically accumulatable").

Suggested change
/// typed as `&[V::Atomic]` — a shared, atomically-acumulatable array —
/// typed as `&[V::Atomic]` — a shared, atomically-accumulatable array —

Copilot uses AI. Check for mistakes.
- algorithm.rs: replace chunks_mut with explicit (begin,end) ranges so
  exactly n_threads workers are spawned, matching the Barrier size and
  preventing a potential deadlock when n is not a multiple of chunk.
  Same fix applied to f_parts construction (latent panic when
  (n_threads-1)*chunk > n).
- linear_operator.rs: fix typo "acumulatable" → "accumulatable"
- norm_est.rs: document f32 precision tradeoff in onenorm_matrix_power_nnm
- benchmark.py: fix variable names (zz_bonds→xx_bonds, x_bonds→z_bonds),
  update docstring to match actual operator, replace hardcoded 10-site
  bitstring with "0"*L, rename `input` (shadows builtin) to `psi0`

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.



L = 30
# TODO build example onf PZ symmetry group, particle-hole symmetry, etc. need to extend symmetry groups
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: "onf" should be "of" in this TODO comment.

Suggested change
# TODO build example onf PZ symmetry group, particle-hole symmetry, etc. need to extend symmetry groups
# TODO build example of PZ symmetry group, particle-hole symmetry, etc. need to extend symmetry groups

Copilot uses AI. Check for mistakes.
output = np.zeros_like(psi0)

# TODO:
# * see if we can merge `dot` and `dot_many` into a single python biniding.
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: "biniding" should be "binding".

Suggested change
# * see if we can merge `dot` and `dot_many` into a single python biniding.
# * see if we can merge `dot` and `dot_many` into a single python binding.

Copilot uses AI. Check for mistakes.
ham.dot_many(0.5, psi0.reshape((-1, 1)), output.reshape((-1, 1)), True)


# TODO: Fix this this interface, should have input and output arrays
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar: duplicated word "this" in the TODO comment; consider rewriting to "Fix this interface".

Suggested change
# TODO: Fix this this interface, should have input and output arrays
# TODO: Fix this interface, should have input and output arrays

Copilot uses AI. Check for mistakes.
@weinbe58 weinbe58 merged commit ae9b135 into main Apr 13, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants