coll: update the json selection of MPIR_Bcast_intra_scatter_ring_allgather by hzhou · Pull Request #7332 · pmodels/mpich

hzhou · 2025-03-12T22:50:50Z

Pull Request Description

The MPIR_Bcast_intra_scatter_ring_allgather won't perform if the
per_proc_msg_size (chunk size) is too small, which accumulates latency
in each round.

Fixes #7330
[skip warnings]

Author Checklist

Provide Description
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits Follow Good Practice
Commits are self-contained and do not do two things at once.
Commit message is of the form: module: short description
Commit message explains what's in the commit.
Passes All Tests
Whitespace checker. Warnings test. Additional tests via comments.
Contribution Agreement
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.

For some algorithm, e.g. MPIR_Bcast_intra_scatter_ring_allgather, what matters is not message size, but message size per process, since that determines whether each round is bandwidth-bound.

The MPIR_Bcast_intra_scatter_ring_allgather won't perform if the per_proc_msg_size (chunk size) is too small, which accumulates latency in each round.

mjwilkins18 · 2025-03-13T13:44:25Z

Did you do a performance test with this, or are these changes just based on the conversation earlier this week? Its on my TODOs to do some performance runs on Aurora, so I can try testing this too

hzhou · 2025-03-13T15:05:23Z

Did you do a performance test with this, or are these changes just based on the conversation earlier this week? Its on my TODOs to do some performance runs on Aurora, so I can try testing this too

Thanks for volunteering! :)

The patch is to address the obvious issue so it don't perform outrageously bad. Yes, we should use tests to finetune the threshold.

hzhou added 2 commits March 12, 2025 17:23

coll/csel: add per_proc_msg_size json selection operator

ed6a3bc

For some algorithm, e.g. MPIR_Bcast_intra_scatter_ring_allgather, what matters is not message size, but message size per process, since that determines whether each round is bandwidth-bound.

coll/json: update generic allreduce json selection

05f070d

The MPIR_Bcast_intra_scatter_ring_allgather won't perform if the per_proc_msg_size (chunk size) is too small, which accumulates latency in each round.

hzhou requested a review from mjwilkins18 March 12, 2025 22:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coll: update the json selection of MPIR_Bcast_intra_scatter_ring_allgather#7332

coll: update the json selection of MPIR_Bcast_intra_scatter_ring_allgather#7332
hzhou wants to merge 2 commits intopmodels:mainfrom
hzhou:2503_csel_perproc

hzhou commented Mar 12, 2025 •

edited

Loading

Uh oh!

mjwilkins18 commented Mar 13, 2025

Uh oh!

hzhou commented Mar 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hzhou commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Description

Author Checklist

Uh oh!

mjwilkins18 commented Mar 13, 2025

Uh oh!

hzhou commented Mar 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hzhou commented Mar 12, 2025 •

edited

Loading