[WIP] add perf comparison on non-main brances with run-sweep workflow by cquil11 · Pull Request #880 · SemiAnalysisAI/InferenceX

cquil11 · 2026-03-06T21:16:13Z

Summary

This PR adds automated performance comparison for PR sweep runs and includes a minor test config change.

What it accomplishes

New compare-results job in run-sweep.yml: When a sweep runs on a PR (non-main branch), this job automatically compares the benchmark results from the PR against the most recent baseline results from main. The comparison is rendered as a throughput table in the GitHub Actions Step Summary, showing current vs. baseline tok/s/gpu with delta and percentage change.
New utils/compare_results.py script: Implements the comparison logic:
- Reads benchmark result JSON artifacts produced by the sweep
- Queries the Neon PostgreSQL database for the most recent matching baseline result on main (matching by hardware, model, framework, precision, parallelism, ISL/OSL, and concurrency)
- Computes throughput deltas and formats a markdown table using tabulate
- Supports both single-node and multi-node (disaggregated) configurations
Test perf-changelog.yaml entry: Adds a test entry for gptoss-fp4-b200-vllm to trigger the sweep workflow on this PR branch.
Config change in nvidia-master.yaml: Comments out the 1k8k and 8k1k search-space entries for gptoss-fp4-b200-vllm (keeping only 1k1k) to reduce the test sweep scope.

How it works

The compare-results job runs only on pull_request events after collect-results succeeds
It downloads the aggregated benchmark artifacts, then runs compare_results.py against the results database
For each benchmark result, it looks up the latest matching main branch result by config dimensions (hardware, model, framework, precision, TP/EP, ISL, OSL, concurrency)
Outputs a formatted comparison table to $GITHUB_STEP_SUMMARY so reviewers can see performance impact at a glance

github-actions · 2026-03-06T21:16:21Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

github-actions · 2026-03-06T21:16:22Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

github-actions · 2026-03-06T21:16:22Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

jgangani · 2026-03-06T21:27:21Z

ACK. @claude can you add short MR description? what this MR accomplishes and how?

Klaud-Cold · 2026-03-06T21:27:43Z

Claude finished @jgangani's task in 54s —— View job

Adding PR Description

Read changed files to understand the PR
Write PR description
Update PR with description

Done — added a description to the PR covering:

What: New compare-results job in run-sweep.yml and utils/compare_results.py script for automated throughput comparison on PR sweeps
How: Downloads benchmark artifacts, queries the Neon DB for the latest matching main baseline, computes deltas, and renders a markdown table in the GitHub Actions Step Summary
Also: Test perf-changelog entry and scoped-down gptoss-fp4-b200-vllm config for testing

functionstackx · 2026-03-06T22:41:09Z

@cquil11 can u add baseline interactivity too in addition to baseline throughput?

similar to what we have on the dashboard

init

de2710e

cquil11 requested a review from a team March 6, 2026 21:16

cquil11 requested review from ankursingh-nv, jgangani and kedarpotdar-nv as code owners March 6, 2026 21:16

cquil11 added the sweep-enabled label Mar 6, 2026

github-project-automation bot added this to InferenceMAX Board Mar 6, 2026

cquil11 added 3 commits March 6, 2026 15:34

testing 1

653226c

testing 2

2ba1b81

testing 3

8618beb

cquil11 added 2 commits March 6, 2026 16:55

testing 4

e857d36

Merge remote-tracking branch 'origin/main' into feat/run-sweep-compare

ea9e5e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] add perf comparison on non-main brances with run-sweep workflow#880

[WIP] add perf comparison on non-main brances with run-sweep workflow#880
cquil11 wants to merge 6 commits intomainfrom
feat/run-sweep-compare

cquil11 commented Mar 6, 2026 •

edited by Klaud-Cold

Loading

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

jgangani commented Mar 6, 2026

Uh oh!

Klaud-Cold commented Mar 6, 2026 •

edited

Loading

Uh oh!

functionstackx commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

cquil11 commented Mar 6, 2026 • edited by Klaud-Cold Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What it accomplishes

How it works

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

jgangani commented Mar 6, 2026

Uh oh!

Klaud-Cold commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Adding PR Description

Uh oh!

functionstackx commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cquil11 commented Mar 6, 2026 •

edited by Klaud-Cold

Loading

Klaud-Cold commented Mar 6, 2026 •

edited

Loading