Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
f0b8f42
[AMD]: fix AITER flags for vllm v0.14.0 docker image (#535)
Rohan138 Jan 26, 2026
0b66978
fix: add final newline to original perf-changelog.yaml so that there …
cquil11 Jan 26, 2026
8252370
chore: run official sweep to add evals (#558)
Oseltamivir Jan 26, 2026
bac15ad
[NV] dsr1 fp4 b300 dynamo trtllm (#532)
jthomson04 Jan 27, 2026
a045c1b
fix: add final newline to original perf-changelog.yaml so that there …
cquil11 Jan 27, 2026
f335aa8
Chun reima/sglang fp8 v0.5.8 mi355 (#572)
chunfangamd Jan 27, 2026
36b9afa
initial commit
cquil11 Jan 27, 2026
848fbca
Revert "[NV] dsr1 fp4 b300 dynamo trtllm (#532)" (#583) [skip-sweep]
cquil11 Jan 27, 2026
61b2b21
Increase eval timeout (#584)
Oseltamivir Jan 27, 2026
d541bf0
chore: save server long as artifact after single node runs (#576)
cquil11 Jan 27, 2026
28bc58e
revert (#586) [skip-sweep]
cquil11 Jan 27, 2026
5ca7960
chore: add pre-merge check for newline in perf-changelog.yaml (#579)
cquil11 Jan 27, 2026
77fe65b
Merge remote-tracking branch 'origin/main' into experimental/multi-tu…
cquil11 Jan 27, 2026
36110f1
commit
cquil11 Jan 29, 2026
a10b511
chore: add large data files to gitignore
cquil11 Jan 29, 2026
a1acc8a
update vllm bench
cquil11 Jan 30, 2026
8a33d1f
save response
cquil11 Jan 30, 2026
0735853
metrics collector
cquil11 Feb 2, 2026
b5ffc81
add pbar
cquil11 Feb 3, 2026
d1a5cc5
add pbar
cquil11 Feb 3, 2026
26d7e0a
metrics collector fix
cquil11 Feb 3, 2026
bf045a0
cpu offload metrics
cquil11 Feb 3, 2026
339d71a
cpu offload metrics pt 2
cquil11 Feb 3, 2026
47c5160
fix man num requests
cquil11 Feb 3, 2026
d994856
fix man num requests pt 2
cquil11 Feb 3, 2026
33ca9fa
fix join deadlock
cquil11 Feb 3, 2026
c6c2e84
add new plots
cquil11 Feb 3, 2026
997ed00
add dcgmi and remove some plots
cquil11 Feb 3, 2026
5f1eef1
make tx / rx plots continuous
cquil11 Feb 3, 2026
c42aa1b
make tx / rx plots continuous and add cumsum
cquil11 Feb 3, 2026
b43d46a
stop collection
cquil11 Feb 3, 2026
25de130
always generate plots when metrics collector exists
cquil11 Feb 3, 2026
6aa3680
add metrics csv
cquil11 Feb 4, 2026
d449c7e
sweep
cquil11 Feb 4, 2026
5a2bfa2
sweep
cquil11 Feb 4, 2026
c4f8562
sweep
cquil11 Feb 4, 2026
e7226f1
sweep
cquil11 Feb 4, 2026
1a6c303
resume
cquil11 Feb 4, 2026
1103782
change
cquil11 Feb 4, 2026
feb0f3a
change cleanup
cquil11 Feb 4, 2026
337a203
retries
cquil11 Feb 4, 2026
193d206
lower num requests
cquil11 Feb 4, 2026
4d7f2b6
more aggressive cleanup
cquil11 Feb 4, 2026
6aaf832
test
cquil11 Feb 6, 2026
ccb3fe8
full sweep
cquil11 Feb 9, 2026
4ac0155
limit max num requests
cquil11 Feb 9, 2026
ebd571e
reorganize
cquil11 Feb 25, 2026
bab1543
reorganize
cquil11 Feb 26, 2026
2582608
update
cquil11 Feb 27, 2026
c3d0a88
update gitignore
cquil11 Feb 27, 2026
e0b46a6
add new sweep
cquil11 Feb 27, 2026
41514f6
add durationg
cquil11 Feb 27, 2026
68b4d0e
add cd to right dir
cquil11 Feb 27, 2026
756f8c2
add cd to right dir
cquil11 Feb 27, 2026
df796c3
add cd to right dir
cquil11 Feb 27, 2026
6584976
max num seqs
cquil11 Feb 27, 2026
f8abfe2
get rid of max cudagraph batch size
cquil11 Feb 27, 2026
b217e6e
get rid of max cudagraph batch size
cquil11 Feb 27, 2026
c24d543
add new metrics
cquil11 Feb 27, 2026
a5e840f
add new launch scripts
cquil11 Feb 27, 2026
eb7e8ec
Merge remote-tracking branch 'origin/main' into experimental/multi-tu…
cquil11 Feb 27, 2026
95b8d98
add new launch scripts
cquil11 Feb 27, 2026
7abc842
add on push
cquil11 Feb 27, 2026
042652e
rm on push
cquil11 Feb 27, 2026
964b8dc
fix file path
cquil11 Feb 27, 2026
0284acf
pip install requirements before runnung
cquil11 Feb 27, 2026
f32a888
Add sample_20k_realistic.json via Git LFS
cquil11 Feb 27, 2026
1c751b0
Add sample_20k_realistic.json via Git LFS pt 2
cquil11 Feb 27, 2026
c40d9cc
install git lfs
cquil11 Feb 27, 2026
1232c35
download dataset
cquil11 Feb 27, 2026
0ed31b8
download dataset pt 2
cquil11 Feb 27, 2026
07413ad
download dataset pt 2
cquil11 Feb 27, 2026
9171abf
5 minutes
cquil11 Feb 27, 2026
f590698
kill clients immediately
cquil11 Feb 28, 2026
124f126
kill clients immediately pt 2
cquil11 Feb 28, 2026
4bfd76f
sentinel
cquil11 Feb 28, 2026
3b926b9
add more analysis scripts
cquil11 Feb 28, 2026
36637ab
add more graphinh
cquil11 Mar 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
experimental/multiturn/vllm_benchmark/sample_20k_realistic.json filter=lfs diff=lfs merge=lfs -text
158 changes: 158 additions & 0 deletions .github/workflows/benchmark-multiturn-tmpl.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
name: Template - Multi-Turn Benchmark
on:
workflow_call:
inputs:
runner:
required: true
type: string
image:
required: true
type: string
model:
required: true
type: string
precision:
required: false
type: string
default: 'fp4'
exp-name:
required: true
type: string
tp:
required: true
type: string
users:
required: true
type: string
offload-mode:
description: "on = prefix+offload, off = prefix only, noprefix = no prefix caching"
required: true
type: string
duration:
required: true
type: string
think-time:
description: "Log-normal think-time params (mu,sigma)"
required: true
type: string
total-cpu-dram-gb:
required: false
type: string
default: '300'
ref:
description: "Git ref (branch/sha) to checkout"
required: false
type: string

env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_HUB_CACHE: '/mnt/hf_hub_cache/'
EXP_NAME: ${{ inputs.exp-name }}
MODEL: ${{ inputs.model }}
IMAGE: ${{ inputs.image }}
PRECISION: ${{ inputs.precision }}
FRAMEWORK: 'vllm'
TP: ${{ inputs.tp }}
USERS: ${{ inputs.users }}
OFFLOAD_MODE: ${{ inputs.offload-mode }}
DURATION: ${{ inputs.duration }}
THINK_TIME: ${{ inputs.think-time }}
TOTAL_CPU_DRAM_GB: ${{ inputs.total-cpu-dram-gb }}
SPEC_DECODING: 'off'

permissions:
contents: read

jobs:
benchmark:
runs-on: ${{ inputs.runner }}
timeout-minutes: 180
name: "${{ inputs.exp-name }} tp=${{ inputs.tp }} users=${{ inputs.users }} offload=${{ inputs.offload-mode }}"
steps:
- name: Resource cleanup (pre-run)
run: &resource-cleanup |
# Cleanup Docker resources
if command -v docker >/dev/null 2>&1 && docker info >/dev/null 2>&1; then
echo "[Docker] Cleaning up resources ..."
docker ps -aq | xargs -r docker rm -f
docker network prune -f
while [ -n "$(docker ps -aq)" ]; do
docker ps -a
sleep 5
done
fi

# Cleanup SLURM resources
if command -v squeue >/dev/null 2>&1; then
if [[ "${{ runner.name }}" == mi355x-amds* || "${{ runner.name }}" == gb200-nv* || "${{ runner.name }}" == gb300-nv* || "${{ runner.name }}" == h100-dgxc-slurm* || "${{ runner.name }}" == h200-dgxc-slurm* || "${{ runner.name }}" == b200-dgxc-slurm* ]]; then
echo "[Slurm] Cleaning up jobs with name: ${{ runner.name }} ..."
scancel --name="${{ runner.name }}" || true
while [ -n "$(squeue --name='${{ runner.name }}' --noheader --format='%i')" ]; do
squeue --name="${{ runner.name }}"
sleep 5
done
else
echo "[Slurm] Cleaning up jobs for user: $USER ..."
scancel -u "$USER" || true
while [ -n "$(squeue -u "$USER" --noheader --format='%i')" ]; do
squeue -u "$USER"
sleep 5
done
fi
fi

- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
token: ${{ secrets.REPO_PAT }}
fetch-depth: 0
ref: ${{ inputs.ref || github.ref }}


- name: Launch job script
env:
RUNNER_NAME: ${{ runner.name }}
RESULT_DIR: /workspace/results
run: |
bash ./runners/launch_${RUNNER_NAME%%_*}.sh

# The runner script doesn't propagate exit codes (scancel masks them).
# Check status.txt to determine if the benchmark actually succeeded.
if [ ! -f results/status.txt ]; then
echo "Run failed: results/status.txt not found." >&2
exit 1
fi
STATUS=$(cat results/status.txt)
if [ "$STATUS" != "SUCCESS" ]; then
echo "Run failed: status=$STATUS" >&2
cat results/benchmark.log 2>/dev/null || true
exit 1
fi

- name: Upload results
if: always()
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
with:
name: "multiturn_tp${{ inputs.tp }}_users${{ inputs.users }}_offload${{ inputs.offload-mode }}"
path: |
results/metrics_client_metrics.csv
results/metrics_server_metrics.csv
results/metrics_plots.png
results/benchmark.log
results/server.log
results/config.yaml
results/vllm_command.txt
results/benchmark_command.txt
results/status.txt
if-no-files-found: ignore

- name: Upload server logs
if: always()
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
with:
name: "server_logs_tp${{ inputs.tp }}_users${{ inputs.users }}_offload${{ inputs.offload-mode }}"
path: results/server.log
if-no-files-found: ignore

- name: Resource cleanup (post-run)
if: always()
run: *resource-cleanup
120 changes: 120 additions & 0 deletions .github/workflows/multiturn-sweep.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
name: Multi-Turn Benchmark Sweep
run-name: "Multi-Turn Sweep - tp=${{ inputs.tp_values }} users=${{ inputs.user_values }} offload=${{ inputs.offload_values }}"

on:
# push:
# branches:
# - experimental/multi-turn-benchmark
# paths:
# - .github/workflows/multiturn-sweep.yml
workflow_dispatch:
inputs:
tp_values:
description: 'TP sizes (JSON array)'
required: true
default: '[1, 2, 4, 8]'
type: string
user_values:
description: 'Concurrent user counts (JSON array)'
required: true
default: '[8, 16, 32, 64, 128, 256, 512, 1024, 2048]'
type: string
offload_values:
description: 'Offload modes (JSON array: on/off/noprefix)'
required: true
default: '["on", "off", "noprefix"]'
type: string
duration:
description: 'Benchmark duration in seconds'
required: true
default: '300'
type: string
think_time:
description: 'Log-normal think-time params (mu,sigma)'
required: true
default: '1.39,1.26'
type: string
total_cpu_dram_gb:
description: 'Total CPU DRAM for KV offload (GB)'
required: true
default: '100'
type: string
image:
description: 'Container image'
required: true
default: 'vllm/vllm-openai:v0.16.0'
type: string
model:
description: 'Model name'
required: true
default: 'nvidia/Llama-3.3-70B-Instruct-FP4'
type: string
ref:
description: 'Git ref (branch/sha) to checkout'
required: false
type: string

jobs:
# ---------------------------------------------------------------------------
# Matrix benchmark jobs — each cell calls the multiturn template
# ---------------------------------------------------------------------------
sweep:
uses: ./.github/workflows/benchmark-multiturn-tmpl.yml
name: sweep /
strategy:
fail-fast: false
matrix:
tp: ${{ fromJson(inputs.tp_values) }}
users: ${{ fromJson(inputs.user_values) }}
offload: ${{ fromJson(inputs.offload_values) }}
secrets: inherit
with:
runner: b200
image: ${{ inputs.image }}
model: ${{ inputs.model }}
exp-name: "multiturn_tp${{ matrix.tp }}_users${{ matrix.users }}_offload${{ matrix.offload }}"
tp: "${{ matrix.tp }}"
users: "${{ matrix.users }}"
offload-mode: ${{ matrix.offload }}
duration: ${{ inputs.duration }}
think-time: ${{ inputs.think_time }}
total-cpu-dram-gb: ${{ inputs.total_cpu_dram_gb }}
ref: ${{ inputs.ref }}

# ---------------------------------------------------------------------------
# Collect & aggregate results
# ---------------------------------------------------------------------------
collect:
runs-on: ubuntu-latest
needs: sweep
if: always()
name: Collect results
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
token: ${{ secrets.REPO_PAT }}
fetch-depth: 1
ref: ${{ inputs.ref || github.ref }}

- uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Install dependencies
run: pip install pandas matplotlib numpy

- name: Download all artifacts
uses: actions/download-artifact@v4
with:
pattern: 'multiturn_*'
path: results/

- name: Run aggregation
run: |
python experimental/multiturn/vllm_benchmark/scripts/collect_sweep_results.py results/ aggregated/

- name: Upload aggregated results
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
with:
name: multiturn_aggregated
path: aggregated/
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
**/__pycache__/**
**/.coverage
**/.coverage
# Large data files
experimental/multiturn/vllm_benchmark/sharegpt_20230401_clean_lang_split.json
experimental/qwen_traceA_blksz_16.jsonl
Loading