Skip to content

v4.0: Two-node vllm-spark:26.04 cluster, Qwen3.5-122B FP8, spark hostname#2

Open
cmdlabtech wants to merge 3 commits into
mainfrom
claude/update-dgx-spark-guide-ym4sf
Open

v4.0: Two-node vllm-spark:26.04 cluster, Qwen3.5-122B FP8, spark hostname#2
cmdlabtech wants to merge 3 commits into
mainfrom
claude/update-dgx-spark-guide-ym4sf

Conversation

@cmdlabtech
Copy link
Copy Markdown
Owner

Summary

  • Hardware rename: all sparky hostname/directory references renamed to spark (keeping -01/-02 suffixes throughout)
  • Model upgrade: Intel/Qwen3.5-122B-A10B-int4-AutoRoundQwen/Qwen3.5-122B-A10B-FP8; memory figures updated from ~37 GB (INT4) to ~57 GB (FP8), KV cache headroom ~71 GB
  • New container approach: replaced eugr/spark-vllm-docker clone/build/autodiscovery with pre-built vllm-spark:26.04 image + custom launch scripts mounted at runtime
  • Two-service systemd: vllm-head.service (spark-01) and vllm-worker.service (spark-02) replace the old single vllm-cluster.service
  • NCCL config: NCCL_SOCKET_IFNAME=enp1s0f0np0 replaces old IB HCA env vars; DAC subnet updated to 10.100.100.0/30 (10.100.100.1/2)
  • New Steps 1b/1c: create vllm-head.sh + vllm-worker.sh launch scripts; rsync scripts and docker save | ssh | docker load to spark-02
  • LiteLLM model_info: added supports_function_calling, supports_tool_choice, max_context_window: 262144, token limit fields to both node configs
  • SVG diagram: real IPs, correct DAC subnet, updated NCCL env var labels
  • Security: removed all real usernames (cameronYOUR_USERNAME); confirmed no secrets/keys/passwords exposed
  • Footer: bumped to v4.0

Test plan

  • Review updated SVG architecture diagram for correctness (IPs, subnet, env vars)
  • Verify Step 1b script content matches the vllm serve / ray start commands
  • Verify Step 1e Docker run commands have correct VLLM_HOST_IP per node
  • Verify systemd unit files reference correct entrypoints and env vars
  • Check LiteLLM config blocks include model_info on both node configs
  • Confirm no cameron, sparky, AutoRound, eugr, vllm-node-tf5, or run-recipe.sh text remains
  • Confirm no secrets or real credentials are visible in the guide

https://claude.ai/code/session_013LoMSbGUaKy6gSCiuL1W9Z


Generated by Claude Code

claude added 3 commits May 11, 2026 20:05
…park, no real usernames

- Rename all sparky→spark hostname/directory references throughout
- Replace model Intel/Qwen3.5-122B-A10B-int4-AutoRound with Qwen/Qwen3.5-122B-A10B-FP8
- Add Steps 1b/1c: create vllm-head.sh + vllm-worker.sh launch scripts, rsync to spark-02
- Replace vllm cluster launch with two-service systemd (vllm-head.service / vllm-worker.service)
- Update Docker run command: NCCL_SOCKET_IFNAME=enp1s0f0np0 replaces IB HCA env vars
- Add LiteLLM model_info block (function calling, tool choice, 262144 context)
- Update memory figures: ~57 GB resident FP8 (was ~37 GB INT4), ~71 GB KV cache headroom
- Update validation checklist: 57 GB and NCCL socket interface check
- Update performance table: FP8 model, GPU memory utilization row
- Update SVG architecture diagram: real IPs, DAC subnet 10.100.100.0/30, NCCL env vars
- Update file-locations section: new scripts tree, vllm-worker.service on spark-02
- Remove all real usernames (cameron); replace with YOUR_USERNAME throughout
- Ensure no secrets/keys/passwords in public guide
- Bump footer to v4.0

https://claude.ai/code/session_013LoMSbGUaKy6gSCiuL1W9Z
…chas

- vllm-head.sh now starts Ray head inside container (ray start --head), waits
  60 s, then calls vllm serve — reflects actual two-phase startup
- Update startup order throughout: spark-02 worker must start first; head
  starts after worker logs "Ray runtime started"
- Lower max-model-len 262144 → 131072 (higher values OOM on profiling phase)
- Lower gpu-memory-utilization 0.70 → 0.68 (hard limit on GB10 x2 with 122B FP8)
- Update both LiteLLM configs: max_context_window 131072, max_input_tokens 98304,
  add max_tokens: 32768 to litellm_params
- Update performance table: context 131,072, memory-utilization 0.68 row
- Update NCCL verification: expect enp1s0f0np0 socket interface (not NET/IB)
- Fix performance table Network row: NCCL_SOCKET_IFNAME replaces NCCL_IB_HCA
- Rewrite Step 1 intro paragraph to reflect correct container architecture
- Rewrite bootstrap 35B fallback: use sed to edit vllm-head.sh in-place
  (entrypoint override no longer works now that Ray starts inside the script)
- Add four new gotchas: fastsafetensors unsupported, OOM above 0.68,
  startup order failure mode, LiteLLM /v1/models no context_length

https://claude.ai/code/session_013LoMSbGUaKy6gSCiuL1W9Z
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants