[WIP] add perf comparison on non-main brances with run-sweep workflow#880
[WIP] add perf comparison on non-main brances with run-sweep workflow#880
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
2 similar comments
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
|
ACK. @claude can you add short MR description? what this MR accomplishes and how? |
|
Claude finished @jgangani's task in 54s —— View job Adding PR Description
Done — added a description to the PR covering:
|
|
@cquil11 can u add baseline interactivity too in addition to baseline throughput? similar to what we have on the dashboard
|

Summary
This PR adds automated performance comparison for PR sweep runs and includes a minor test config change.
What it accomplishes
New
compare-resultsjob inrun-sweep.yml: When a sweep runs on a PR (non-main branch), this job automatically compares the benchmark results from the PR against the most recent baseline results frommain. The comparison is rendered as a throughput table in the GitHub Actions Step Summary, showing current vs. baselinetok/s/gpuwith delta and percentage change.New
utils/compare_results.pyscript: Implements the comparison logic:main(matching by hardware, model, framework, precision, parallelism, ISL/OSL, and concurrency)tabulateTest
perf-changelog.yamlentry: Adds a test entry forgptoss-fp4-b200-vllmto trigger the sweep workflow on this PR branch.Config change in
nvidia-master.yaml: Comments out the 1k8k and 8k1k search-space entries forgptoss-fp4-b200-vllm(keeping only 1k1k) to reduce the test sweep scope.How it works
compare-resultsjob runs only onpull_requestevents aftercollect-resultssucceedscompare_results.pyagainst the results databasemainbranch result by config dimensions (hardware, model, framework, precision, TP/EP, ISL, OSL, concurrency)$GITHUB_STEP_SUMMARYso reviewers can see performance impact at a glance