Add container and tuple optimization helpers #3590

tenpercent · 2026-01-16T07:33:23Z

Summary

Replace lambdas with named functors in container_concat
Add make_uniform_tuple helper for repeated value patterns
Reduces container_concat instantiations from 186 to 93 (50% reduction)

Test Plan

Waiting for full CI

PR Stack

#	PR	Description
1	#3585	sequence_gen with `__make_integer_seq`
2	#3588	generate_identity_sequences helper
3	#3589	Named functors in transform_tensor_descriptor
4	#3590	container_concat optimization
5	#3596	O(1) pack expansion rewrites
6	#3600	TensorDescriptor/TensorAdaptor lambda elimination

Tracking issue: #3575

Add make_uniform_tuple<N>(value) helper to replace common pattern: generate_tuple([&](auto) { return value; }, Number<N>{}) This avoids unique lambda instantiations when creating tuples with repeated values. Applied to device_grouped_conv_fwd_multiple_abd.

Replace O(N) recursive container_reduce with O(1) fold expression for computing products of container elements. This reduces template instantiation depth from 26 to 23 levels. - Add container_product() using unpack + fold expression - Migrate 10 call sites from container_reduce(x, multiplies{}, 1)

Lambdas create unique types per call site, causing duplicate template instantiations. Named functors are shared across call sites. Results: - container_concat: 186 → 93 instantiations (50% reduction) - Wall-clock: 518ms → 309ms (40% reduction)

tenpercent requested review from Snektron, ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, vidyasagar-amd and vpietila-amd as code owners January 16, 2026 07:33

tenpercent marked this pull request as draft January 16, 2026 15:48

tenpercent force-pushed the mpodkory/generate-tuple-optimizations branch from 59f0c32 to 5190578 Compare January 16, 2026 17:34

tenpercent force-pushed the mpodkory/transform-tensor-descriptor-optimization branch from 885b80f to 0791bad Compare January 16, 2026 20:16

tenpercent force-pushed the mpodkory/generate-tuple-optimizations branch from 5190578 to 887bdf2 Compare January 16, 2026 20:16

tenpercent mentioned this pull request Jan 16, 2026

Replace nested static_for lambdas with compile-time search helper #3600

Open

1 task

tenpercent force-pushed the mpodkory/transform-tensor-descriptor-optimization branch from 0791bad to b26ed88 Compare January 17, 2026 03:37

tenpercent marked this pull request as ready for review January 17, 2026 03:41

tenpercent added 2 commits January 16, 2026 21:46

tenpercent force-pushed the mpodkory/transform-tensor-descriptor-optimization branch from b26ed88 to 00849ac Compare January 17, 2026 03:51

tenpercent force-pushed the mpodkory/generate-tuple-optimizations branch from 887bdf2 to 02e42dc Compare January 17, 2026 03:51

tenpercent mentioned this pull request Jan 19, 2026

Add unit tests for template optimization helpers #3610

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add container and tuple optimization helpers #3590

Add container and tuple optimization helpers #3590

tenpercent commented Jan 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add container and tuple optimization helpers #3590

Are you sure you want to change the base?

Add container and tuple optimization helpers #3590

Conversation

tenpercent commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

PR Stack

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tenpercent commented Jan 16, 2026 •

edited

Loading