From 64b610ef142f9c865c582631a7736b9b22364fb7 Mon Sep 17 00:00:00 2001
From: Cryptopoly <31970407+cryptopoly@users.noreply.github.com>
Date: Sun, 17 May 2026 10:47:48 +0100
Subject: [PATCH 01/15] feat: accelerator capability flags + probes (FU-056
 Phase 1)

Foundation for in-app install UX. Lazy importability + version
probes for nunchaku / sageattention / dflash-mlx / dflash-cuda /
triattention / kvpress, plus a Windows-only wsl2 detector that
seeds the upcoming vLLM-via-WSL bridge. Eleven new fields on
BackendCapabilities surface through /api/health; the placeholder
probe primes them on first paint so the UI never flashes
Install for a package that is actually present.

Probes resilient to the half-baked-install failure mode we hit
on Windows (torch directory present but Python source missing):
find_spec swallows ValueError, version reads swallow ImportError
and missing __version__. DFlash MLX vs CUDA flags delegate to
the existing dflash.is_mlx_available / dflash.is_vllm_available
helpers so the upstream package-layout dance stays in one place.

Tests: 25 in tests/test_accelerator_capabilities.py covering
present / absent / broken-install / WSL-status branches.
---
 CLAUDE.md                                 |   1 +
 backend_service/inference/accelerators.py | 201 ++++++++++++++++++
 backend_service/inference/base.py         |  33 +++
 backend_service/inference/capabilities.py |  46 ++++
 tests/test_accelerator_capabilities.py    | 248 ++++++++++++++++++++++
 5 files changed, 529 insertions(+)
 create mode 100644 backend_service/inference/accelerators.py
 create mode 100644 tests/test_accelerator_capabilities.py

diff --git a/CLAUDE.md b/CLAUDE.md
index f72565d..660faff 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -173,6 +173,7 @@ no longer relevant.
 | FU-053 | Library status false-positive: distill / sibling variants marked installed when only base repo is on disk | **Shipped 2026-05-17.** Surfaced live: the Video Models tab showed `Wan 2.2 I2V A14B · Distill 4-step (BF16)` + `(FP8)` rows with a green "installed" tick + 117.5 GB sized, but `du -sh ~/.cache/huggingface/hub/models--Wan-AI--Wan2.2-I2V-A14B-Diffusers-distill-{bf16,fp8}` returned nothing — neither distill repo was actually present. Root cause: both catalog variants share `repo: "Wan-AI/Wan2.2-I2V-A14B-Diffusers"` (the BASE, non-distill repo) and route their distinguishing weights via the separate `distillTransformerRepo: "lightx2v/Wan2.2-Distill-Models"` field (FU-019 pattern). The validator at [_video_variant_validation_error](backend_service/helpers/video.py:226) only checked the *base* repo via `_video_download_validation_error` + `_video_full_precision_weights_validation_error` — both passed because the base repo IS on disk. The distill-specific files (`distillTransformerHighNoiseFile` + `distillTransformerLowNoiseFile`) were never checked, so any variant that pinned distill weights via the FU-019 swap mechanism was unconditionally marked installed once the base was downloaded. Fix: new `_distill_transformer_validation_error` helper that, when a variant declares `distillTransformerRepo`, requires (a) that repo's HF snapshot dir exists and (b) both the high-noise + low-noise filenames are present inside it. Wired into `_video_variant_validation_error` so distill variants now flip to "Not installed" when the distill weights are missing. New unit test in [tests/test_video.py](tests/test_video.py) pins both the false-positive regression (base-only → not installed) and the happy path (base + distill snapshot → installed). Live verification: after the fix landed on this M4 Max, the two distill rows correctly dropped out of the "installed" filter. | Trigger / Condition — closed inline. |
 | FU-054 | Same-repo variants: show actual on-disk size + shared-repo badge | **Shipped 2026-05-17.** Wan 2.2 TI2V 5B GGUF (Q4_K_M + Q6_K + Q8_0) renders as three rows in the Video Models tab, each labelled "31.9 GB". On disk those three share ONE `models--QuantStack--Wan2.2-TI2V-5B-GGUF/` dir totalling 12 GB (Q4_K_M=3.2 GB + Q6_K=3.9 GB + Q8_0=5.0 GB), not 95.7 GB as the per-row "31.9 GB" repetition implies. Two surgical changes: (1) Backend `_video_variant_for_payload` now stat-sizes the specific `ggufFile` (not the whole snapshot dir) for variants that pin a single GGUF, exposing `ggufFileBytes` alongside the existing `onDiskBytes`. (2) UI shows the live on-disk byte size next to the catalog estimate when present, and adds a "shares repo with N other variants" badge when ≥2 catalog variants resolve to the same on-disk repo dir. Avoided a full table restructure — the badge gives the user the "deleting this row does/doesn't affect siblings" signal without dragging in expandable parent rows. Frontend changes confined to [VideoModelsTab.tsx](src/features/video/VideoModelsTab.tsx); CSS additions land in [styles.css](src/styles.css) under `.shared-repo-badge`. | Trigger / Condition — closed inline. |
 | FU-055 | Storage explorer panel in Diagnostics tab — surface top disk consumers in-app | **Shipped 2026-05-17.** Complements Stuff Diver's blind spot on HF cache layout: blobs live at `~/.cache/huggingface/hub/models--*/blobs/<sha>` (each a single 5–30 GB safetensors / GGUF shard), but `snapshots/<rev>/<filename>` are symlinks — third-party scanners that don't follow symlinks miss the real bytes. New endpoint `GET /api/diagnostics/storage-top?limit=20` walks every directory under `state.settings.modelDirectories` (HF cache + AI_Models + ~/Models + user dirs), sums per-repo via `du`-equivalent path walk with cycle protection (reuses `_path_size_bytes` from [discovery.py](backend_service/helpers/discovery.py:49)), and returns sorted `[{path, repoLabel, sizeBytes, lastModified, sourceKind}]`. Frontend renders the top-N table in a new "Disk usage" subsection of the Diagnostics tab with a "Reveal in Finder" + "Delete repo" action per row. Cycle protection prevents the `mlx-video-wan` converted-output dirs (which contain symlinks back to HF cache blobs in some configs) from double-counting. Trip-wire numbers from the M4 Max box on first-run: total `~/.cache/huggingface/hub` = 997 GB, top-3 = LTX-2 dev / LTX-2 distilled / LTX-2.3 distilled at ~87 / 87 / 81 GB respectively. | Trigger / Condition — closed inline. |
+| FU-056 | In-app accelerator install UX (Nunchaku / SageAttention / DFlash CUDA / TriAttention / kvpress + vLLM-via-WSL bridge) | Active. Phase 1 shipped on `feature/accelerator-install-ux`; phases 2-9 pending. | The plan: bring every CUDA-side accelerator install in-band so users never need to drop to PowerShell to type `pip install <name>` — the install affordance lives next to the thing it accelerates (FLUX cards in Image Studio Discover, Wan cards in Video Studio Discover, the chat composer for spec-dec, a one-stop "Boost Pack" panel in Diagnostics for the completionist). The backend pipeline (`POST /api/setup/install-package` → background or sync install → capability re-probe → UI refresh) is already proven by FU-008 / FU-016 / FU-019 / FU-023 / FU-025 — what's missing is per-accelerator capability flags + the contextual badges/buttons on each feature surface. **Phase 1 (foundation, shipped 2026-05-17):** new probe module [backend_service/inference/accelerators.py](backend_service/inference/accelerators.py) with lazy importability + version helpers for nunchaku, sageattention, dflash-mlx, dflash-cuda, triattention, kvpress, plus a Windows-only `wsl2_available()` shell probe for Phase 8. 11 new fields on `BackendCapabilities` ([base.py](backend_service/inference/base.py)) + matching serialization in `to_dict`. Probes wired into both the cheap placeholder probe and the full `_probe_native_backends` ([capabilities.py](backend_service/inference/capabilities.py)) so the frontend gets accurate "Install" / "Installed" state on first paint. 25 unit tests in [tests/test_accelerator_capabilities.py](tests/test_accelerator_capabilities.py) pin the present / absent / broken-install matrix. **Phases 2-9 planned:** (2) reusable `<AcceleratorCard>` component, (3) Image Studio Discover/Models badges + post-generation suggestion toast, (4) Video Studio Discover/Models badges + LongLive bundle rename, (5) Chat composer hint when CUDA + DRAFT_MODEL_MAP hit, (6) Diagnostics "Boost Pack" panel (one-stop view), (7) per-variant `recommendedAccelerators` catalog metadata + i18n, (8) Windows vLLM-via-WSL2 bridge (WSL detector + isolated venv install + remote subprocess engine), (9) cache-strategy-matrix runner + pre-build gate integration. End-state UX: fresh user installs ChaosEngineAI → downloads FLUX → sees "🚀 Nunchaku +3× available [Install]" pill on the catalog card → one click → 90s later first generation runs at SVDQuant speed, no terminal required. |
 | FU-049 | Python 3.14 support gate | Re-evaluate quarterly. Trigger to bump `requires-python` floor: torch ≥2.6 publishes stable cp314 wheels for darwin-arm64 + win-amd64 + linux-x86_64 (CUDA + CPU) **AND** mlx-lm + mlx-vlm + mlx-video publish cp314 wheels **AND** Astral `python-build-standalone` ships a 3.14 portable build for the Tauri sidecar. | Today `pyproject.toml` declares `requires-python = ">=3.10"` and the test matrix runs 3.11/3.12 ([scripts/e2e_test_suite.py](scripts/e2e_test_suite.py), Windows test guide, CI). Stay on 3.11/3.12 for ship + test until cp314 wheel coverage closes. **Why 3.14 is on radar:** (a) we already renamed `compression/` → `cache_compression/` to avoid shadowing Python 3.14's new stdlib `compression` namespace pkg — pre-emptive fix landed in v0.8.0. (b) 3.14 ships PEP 779 free-threaded build as stable (still opt-in via `python3.14t` builds), interesting for the FastAPI parent process but irrelevant for subprocess-isolated MLX / sd-cli / longlive workers. (c) GIL-default 3.14 still gives modest perf wins from tail-call interpreter (~5% on CPython benchmarks) + new sub-interpreter API. **3.14 blockers as of 2026-05-17:** (1) **PyTorch** — torch 2.5 stable + 2.6 nightly do not yet publish cp314 wheels on the full darwin-arm64 + win-amd64 + linux-x86_64 × {CPU, CUDA 12.4, ROCm 6.2, MPS} matrix; most painful single dep since image + video runtimes pin torch. (2) **MLX stack** — `mlx`, `mlx-lm`, `mlx-vlm`, `mlx-video` currently ship cp310–cp313 wheels; Apple Silicon adoption typically lags CPython release by 1–3 months. (3) **CUDA-compiled deps** — `bitsandbytes`, `flash-attn`, `sageattention`, `nunchaku`, `triattention`, `vllm-swift`, `dflash-mlx` (git+url, builds from source — needs cp314 cython/setuptools chain too). (4) **Tauri sidecar** — desktop release builds embed Python via Astral's `python-build-standalone`; need their 3.14 portable build before bundling. (5) **3.14 stdlib breakage** — deprecation removals (e.g. `typing.io`, `typing.re`, `asyncio.coroutine` shim, `pkg_resources` consequences); needs an audit pass via `python -W error::DeprecationWarning -m pytest tests/` on a 3.14 venv. (6) **3.14 `compression` namespace pkg** — already mitigated, but a regression probe should land in `pre-build-check` once we run CI on 3.14 (assert `from cache_compression import registry` works on cp314). **Plan when gate opens:** (a) bump `requires-python` to `>=3.11` first as an intermediate step (drops 3.10, which has no cp314 wheels anyway and lets us drop a few back-compat code paths); (b) add cp314 to GitHub Actions CI matrix alongside cp311 + cp312; (c) add a `python -X importtime` regression probe to `scripts/perf-baseline.py` (3.14's tail-call interpreter should improve cold-start by ~3–5%, want to measure); (d) bump `requires-python` to `>=3.11` floor + `<3.15` ceiling once green; (e) the Tauri sidecar Python pin advances independently — driven by `python-build-standalone` releases not pyproject. |
 
 ---
diff --git a/backend_service/inference/accelerators.py b/backend_service/inference/accelerators.py
new file mode 100644
index 0000000..5b73f23
--- /dev/null
+++ b/backend_service/inference/accelerators.py
@@ -0,0 +1,201 @@
+"""Probe helpers for CUDA-side accelerator packages (FU-056 Phase 1).
+
+Lazy importability + version probes for the five accelerators the
+Setup tab + per-feature install panels expose:
+
+- **nunchaku** — SVDQuant 4-bit transformers for FLUX / SD3.5 / Qwen-Image
+  (FU-023). Pulled in by ``ImageStudio`` when a DiT pipeline loads with
+  ``nunchakuRepo`` pinned. CUDA-only at runtime, but the import itself
+  succeeds on any platform so the capability flag tracks "package usable"
+  rather than "package will accelerate this machine".
+- **sageattention** — fast attention kernels for DiT pipelines on CUDA
+  (FU-016). Stacks multiplicatively with FBCache / Nunchaku. No-op on
+  Apple Silicon and on UNet pipelines.
+- **dflash CUDA** — PyTorch/CUDA half of the speculative decoding family
+  (FU-031, FU-048). ``dflash.is_vllm_available()`` already exists in the
+  local ``dflash/__init__.py`` wrapper and inspects the ``dflash.model``
+  submodule, so we delegate to it rather than re-detecting here.
+- **triattention** — vLLM compressor used by FU-003 LongLive on CUDA
+  and FU-002 on Apple Silicon. The pip name + import name agree
+  (``triattention``).
+- **kvpress** — NVIDIA KV cache compression toolkit (FU-027). Already
+  registered in ``_INSTALLABLE_PIP_PACKAGES`` but had no capability flag
+  before this phase; integration code arrives in a later phase, but the
+  install button needs the flag to gate "Installed ✓" state.
+
+Plus a Windows-specific ``wsl2_available()`` helper used by the future
+Phase 8 vLLM-via-WSL bridge. On macOS/Linux it's always ``False`` — the
+flag only carries weight on Windows where ``vllm`` has no native wheels.
+
+Probes are deliberately lazy: every ``import`` lives inside a function
+body so ``python -X importtime backend_service.app`` stays under the 2 s
+cold-start budget (per CLAUDE.md performance guidelines). The companion
+``_version`` helpers return ``None`` if the package isn't installed —
+callers don't need a separate availability check before reading them.
+"""
+
+from __future__ import annotations
+
+import importlib
+import importlib.util
+import subprocess
+import sys
+
+
+def _spec_exists(module_name: str) -> bool:
+    """``importlib.util.find_spec`` wrapper that swallows ModuleNotFoundError.
+
+    ``find_spec`` can raise on partially-broken installs (e.g. a torch
+    directory that exists on disk but has no ``__init__.py``) — see the
+    Windows torch install bug investigated 2026-05-17. We treat any raise
+    as "not available" so the capability resolver never crashes on a half-
+    installed package.
+    """
+    try:
+        return importlib.util.find_spec(module_name) is not None
+    except (ImportError, ValueError):
+        return False
+
+
+def _safe_version(module_name: str) -> str | None:
+    """Read ``__version__`` without crashing on broken installs.
+
+    Mirrors the half-broken-install resilience of ``_spec_exists``: a
+    package that registers an import spec but has no Python source (the
+    Windows ``torch/`` failure mode) raises on attribute access, not on
+    ``find_spec``. Catching here keeps the capability payload honest.
+    """
+    if not _spec_exists(module_name):
+        return None
+    try:
+        module = importlib.import_module(module_name)
+    except Exception:
+        return None
+    version = getattr(module, "__version__", None)
+    return str(version) if version is not None else None
+
+
+# ---------------------------------------------------------------------------
+# Nunchaku — FU-023
+# ---------------------------------------------------------------------------
+
+def nunchaku_available() -> bool:
+    return _spec_exists("nunchaku")
+
+
+def nunchaku_version() -> str | None:
+    return _safe_version("nunchaku")
+
+
+# ---------------------------------------------------------------------------
+# SageAttention — FU-016
+# ---------------------------------------------------------------------------
+
+def sageattention_available() -> bool:
+    return _spec_exists("sageattention")
+
+
+def sageattention_version() -> str | None:
+    return _safe_version("sageattention")
+
+
+# ---------------------------------------------------------------------------
+# DFlash — FU-031 (MLX side) + FU-048 (CUDA side)
+#
+# Two flags here because the two backends live in two separate pip
+# packages with two import names (``dflash_mlx`` for Apple Silicon,
+# ``dflash.model`` for CUDA). The shared ``dflash`` integration module
+# already exposes detection helpers; reuse them so the wrapping stays
+# in one place if the upstream package layout changes.
+# ---------------------------------------------------------------------------
+
+def dflash_mlx_available() -> bool:
+    """``dflash_mlx`` (Apple Silicon) — the MLX-native draft runner."""
+    try:
+        from dflash import is_mlx_available
+    except ImportError:
+        return False
+    try:
+        return bool(is_mlx_available())
+    except Exception:
+        return False
+
+
+def dflash_cuda_available() -> bool:
+    """``dflash`` PyPI package (CUDA) — the PyTorch/CUDA draft runner.
+
+    Uses the integration module's existing helper, which checks for the
+    ``dflash.model`` submodule specifically (the local ``dflash/`` wrapper
+    in this repo shadows the bare ``dflash`` import, so the submodule
+    check is what disambiguates "real upstream package" from "our shim").
+    """
+    try:
+        from dflash import is_vllm_available
+    except ImportError:
+        return False
+    try:
+        return bool(is_vllm_available())
+    except Exception:
+        return False
+
+
+def dflash_mlx_version() -> str | None:
+    return _safe_version("dflash_mlx")
+
+
+def dflash_cuda_version() -> str | None:
+    """The CUDA wheel exposes its version via ``dflash.model.__version__``
+    when installed, but our local wrapper ``dflash/__init__.py`` shadows
+    the bare name. Probe the submodule path the upstream package owns.
+    """
+    if not dflash_cuda_available():
+        return None
+    return _safe_version("dflash.model")
+
+
+# ---------------------------------------------------------------------------
+# TriAttention — FU-002 (MLX) + FU-003 LongLive (CUDA)
+# ---------------------------------------------------------------------------
+
+def triattention_available() -> bool:
+    return _spec_exists("triattention")
+
+
+def triattention_version() -> str | None:
+    return _safe_version("triattention")
+
+
+# ---------------------------------------------------------------------------
+# kvpress — FU-027 (capability flag now; integration in a later phase)
+# ---------------------------------------------------------------------------
+
+def kvpress_available() -> bool:
+    return _spec_exists("kvpress")
+
+
+def kvpress_version() -> str | None:
+    return _safe_version("kvpress")
+
+
+# ---------------------------------------------------------------------------
+# WSL2 — Windows-only bridge for vLLM (FU-056 Phase 8)
+#
+# Pure no-op on macOS / Linux. On Windows we shell ``wsl --status`` with
+# a tight timeout. The two-second timeout covers cold WSL service starts
+# without hanging the capability probe — repeated calls are throttled by
+# the capability cache, so a slow first probe doesn't compound.
+# ---------------------------------------------------------------------------
+
+def wsl2_available() -> bool:
+    if sys.platform != "win32":
+        return False
+    try:
+        result = subprocess.run(
+            ["wsl", "--status"],
+            capture_output=True,
+            timeout=2.0,
+            check=False,
+        )
+    except (FileNotFoundError, subprocess.TimeoutExpired, OSError):
+        return False
+    return result.returncode == 0
diff --git a/backend_service/inference/base.py b/backend_service/inference/base.py
index c1782f2..47119ac 100644
--- a/backend_service/inference/base.py
+++ b/backend_service/inference/base.py
@@ -97,6 +97,26 @@ class BackendCapabilities:
     # help text. The UI keys an MTP affordance for GGUF models off this
     # alongside mtplxAvailable for MLX models.
     ggufMtpAvailable: bool = False
+    # FU-056 Phase 1: CUDA-side accelerator capability flags. The Setup
+    # tab + per-feature install panels gate "Install" vs "Installed" UI
+    # off these. ``dflashMlxAvailable`` and ``dflashCudaAvailable`` are
+    # separate because the two backends live in two pip packages, even
+    # though the user-facing "DFlash" affordance is the same feature on
+    # both platforms. ``wsl2Available`` is Windows-only and seeds the
+    # Phase 8 vLLM-via-WSL bridge — always ``False`` on macOS / Linux.
+    nunchakuAvailable: bool = False
+    nunchakuVersion: str | None = None
+    sageattentionAvailable: bool = False
+    sageattentionVersion: str | None = None
+    dflashMlxAvailable: bool = False
+    dflashMlxVersion: str | None = None
+    dflashCudaAvailable: bool = False
+    dflashCudaVersion: str | None = None
+    triattentionAvailable: bool = False
+    triattentionVersion: str | None = None
+    kvpressAvailable: bool = False
+    kvpressVersion: str | None = None
+    wsl2Available: bool = False
     probing: bool = False
 
     def to_dict(self) -> dict[str, Any]:
@@ -118,6 +138,19 @@ def to_dict(self) -> dict[str, Any]:
             "mtplxAvailable": self.mtplxAvailable,
             "mtplxPythonPath": self.mtplxPythonPath,
             "ggufMtpAvailable": self.ggufMtpAvailable,
+            "nunchakuAvailable": self.nunchakuAvailable,
+            "nunchakuVersion": self.nunchakuVersion,
+            "sageattentionAvailable": self.sageattentionAvailable,
+            "sageattentionVersion": self.sageattentionVersion,
+            "dflashMlxAvailable": self.dflashMlxAvailable,
+            "dflashMlxVersion": self.dflashMlxVersion,
+            "dflashCudaAvailable": self.dflashCudaAvailable,
+            "dflashCudaVersion": self.dflashCudaVersion,
+            "triattentionAvailable": self.triattentionAvailable,
+            "triattentionVersion": self.triattentionVersion,
+            "kvpressAvailable": self.kvpressAvailable,
+            "kvpressVersion": self.kvpressVersion,
+            "wsl2Available": self.wsl2Available,
             "probing": self.probing,
         }
 
diff --git a/backend_service/inference/capabilities.py b/backend_service/inference/capabilities.py
index 0f3a0c5..ca82854 100644
--- a/backend_service/inference/capabilities.py
+++ b/backend_service/inference/capabilities.py
@@ -16,6 +16,21 @@
 from pathlib import Path
 
 from backend_service.inference._constants import CAPABILITY_CACHE_TTL_SECONDS
+from backend_service.inference.accelerators import (
+    dflash_cuda_available,
+    dflash_cuda_version,
+    dflash_mlx_available,
+    dflash_mlx_version,
+    kvpress_available,
+    kvpress_version,
+    nunchaku_available,
+    nunchaku_version,
+    sageattention_available,
+    sageattention_version,
+    triattention_available,
+    triattention_version,
+    wsl2_available,
+)
 from backend_service.inference.base import BackendCapabilities
 from backend_service.inference.binaries import (
     _json_subprocess,
@@ -59,6 +74,10 @@ def _initial_backend_capabilities() -> BackendCapabilities:
     llama_server_turbo_path = _resolve_llama_server_turbo()
     llama_cli_path = _resolve_llama_cli()
     mtplx_available, mtplx_python = _detect_mtplx()
+    # FU-056 Phase 1: prime accelerator flags during the placeholder phase
+    # too. The probes are cheap (single ``find_spec`` per package, no
+    # imports) so the UI gets accurate "Install" vs "Installed" state on
+    # first render without waiting for the full MLX subprocess probe.
     return BackendCapabilities(
         pythonExecutable=python_executable,
         mlxAvailable=False,
@@ -74,6 +93,19 @@ def _initial_backend_capabilities() -> BackendCapabilities:
         vllmVersion=None,
         mtplxAvailable=mtplx_available,
         mtplxPythonPath=mtplx_python,
+        nunchakuAvailable=nunchaku_available(),
+        nunchakuVersion=nunchaku_version(),
+        sageattentionAvailable=sageattention_available(),
+        sageattentionVersion=sageattention_version(),
+        dflashMlxAvailable=dflash_mlx_available(),
+        dflashMlxVersion=dflash_mlx_version(),
+        dflashCudaAvailable=dflash_cuda_available(),
+        dflashCudaVersion=dflash_cuda_version(),
+        triattentionAvailable=triattention_available(),
+        triattentionVersion=triattention_version(),
+        kvpressAvailable=kvpress_available(),
+        kvpressVersion=kvpress_version(),
+        wsl2Available=wsl2_available(),
         probing=True,
     )
 
@@ -133,6 +165,20 @@ def _probe_native_backends() -> BackendCapabilities:
         mtplxAvailable=mtplx_available,
         mtplxPythonPath=mtplx_python,
         ggufMtpAvailable=gguf_mtp_available,
+        # FU-056 Phase 1: per-accelerator import + version probes.
+        nunchakuAvailable=nunchaku_available(),
+        nunchakuVersion=nunchaku_version(),
+        sageattentionAvailable=sageattention_available(),
+        sageattentionVersion=sageattention_version(),
+        dflashMlxAvailable=dflash_mlx_available(),
+        dflashMlxVersion=dflash_mlx_version(),
+        dflashCudaAvailable=dflash_cuda_available(),
+        dflashCudaVersion=dflash_cuda_version(),
+        triattentionAvailable=triattention_available(),
+        triattentionVersion=triattention_version(),
+        kvpressAvailable=kvpress_available(),
+        kvpressVersion=kvpress_version(),
+        wsl2Available=wsl2_available(),
     )
 
 
diff --git a/tests/test_accelerator_capabilities.py b/tests/test_accelerator_capabilities.py
new file mode 100644
index 0000000..a9c00dc
--- /dev/null
+++ b/tests/test_accelerator_capabilities.py
@@ -0,0 +1,248 @@
+"""Tests for FU-056 Phase 1 accelerator capability probes.
+
+Covers ``backend_service/inference/accelerators.py`` and its wiring
+into ``BackendCapabilities.to_dict``. The probes are intentionally
+boring — we're mostly pinning the "package present / package absent /
+package broken" matrix so future regressions can't silently flip the
+UI gating that downstream phases depend on.
+"""
+
+from __future__ import annotations
+
+import sys
+import unittest
+from unittest.mock import MagicMock, patch
+
+from backend_service.inference import accelerators
+from backend_service.inference.base import BackendCapabilities
+
+
+class SpecExistsTests(unittest.TestCase):
+    def test_returns_true_when_module_resolvable(self):
+        # ``json`` is in the stdlib — always findable.
+        self.assertTrue(accelerators._spec_exists("json"))
+
+    def test_returns_false_when_module_absent(self):
+        self.assertFalse(accelerators._spec_exists("nunchaku_fake_module_xyz"))
+
+    def test_swallows_partial_install_raise(self):
+        with patch(
+            "backend_service.inference.accelerators.importlib.util.find_spec",
+            side_effect=ValueError("broken __spec__"),
+        ):
+            self.assertFalse(accelerators._spec_exists("anything"))
+
+
+class SafeVersionTests(unittest.TestCase):
+    def test_returns_none_when_module_absent(self):
+        self.assertIsNone(accelerators._safe_version("nunchaku_fake_module_xyz"))
+
+    def test_returns_version_string_when_present(self):
+        fake_module = MagicMock(__version__="1.2.3")
+        with patch.object(accelerators, "_spec_exists", return_value=True):
+            with patch.object(
+                accelerators.importlib,
+                "import_module",
+                return_value=fake_module,
+            ):
+                self.assertEqual(accelerators._safe_version("anything"), "1.2.3")
+
+    def test_returns_none_when_module_lacks_version(self):
+        fake_module = MagicMock(spec=[])  # no __version__ attribute
+        with patch.object(accelerators, "_spec_exists", return_value=True):
+            with patch.object(
+                accelerators.importlib,
+                "import_module",
+                return_value=fake_module,
+            ):
+                self.assertIsNone(accelerators._safe_version("anything"))
+
+    def test_swallows_import_failure(self):
+        with patch.object(accelerators, "_spec_exists", return_value=True):
+            with patch.object(
+                accelerators.importlib,
+                "import_module",
+                side_effect=ImportError("broken native ext"),
+            ):
+                self.assertIsNone(accelerators._safe_version("anything"))
+
+
+class PerAcceleratorAvailabilityTests(unittest.TestCase):
+    """Each accelerator's ``*_available()`` helper must flip cleanly on
+    ``find_spec`` answers. Patching ``_spec_exists`` rather than
+    ``find_spec`` keeps the test independent of how the real probes
+    are implemented underneath."""
+
+    def test_nunchaku_available_true(self):
+        with patch.object(accelerators, "_spec_exists", return_value=True):
+            self.assertTrue(accelerators.nunchaku_available())
+
+    def test_nunchaku_available_false(self):
+        with patch.object(accelerators, "_spec_exists", return_value=False):
+            self.assertFalse(accelerators.nunchaku_available())
+
+    def test_sageattention_available_true(self):
+        with patch.object(accelerators, "_spec_exists", return_value=True):
+            self.assertTrue(accelerators.sageattention_available())
+
+    def test_triattention_available_true(self):
+        with patch.object(accelerators, "_spec_exists", return_value=True):
+            self.assertTrue(accelerators.triattention_available())
+
+    def test_kvpress_available_true(self):
+        with patch.object(accelerators, "_spec_exists", return_value=True):
+            self.assertTrue(accelerators.kvpress_available())
+
+
+class DflashAvailabilityTests(unittest.TestCase):
+    """DFlash MLX / CUDA flags delegate to ``dflash.is_mlx_available`` and
+    ``dflash.is_vllm_available``. Patch those to drive the branch matrix."""
+
+    def test_mlx_available_when_helper_returns_true(self):
+        with patch("dflash.is_mlx_available", return_value=True, create=True):
+            self.assertTrue(accelerators.dflash_mlx_available())
+
+    def test_mlx_unavailable_when_helper_returns_false(self):
+        with patch("dflash.is_mlx_available", return_value=False, create=True):
+            self.assertFalse(accelerators.dflash_mlx_available())
+
+    def test_mlx_unavailable_when_helper_raises(self):
+        with patch("dflash.is_mlx_available", side_effect=RuntimeError("boom"), create=True):
+            self.assertFalse(accelerators.dflash_mlx_available())
+
+    def test_cuda_available_when_helper_returns_true(self):
+        with patch("dflash.is_vllm_available", return_value=True, create=True):
+            self.assertTrue(accelerators.dflash_cuda_available())
+
+    def test_cuda_unavailable_when_helper_returns_false(self):
+        with patch("dflash.is_vllm_available", return_value=False, create=True):
+            self.assertFalse(accelerators.dflash_cuda_available())
+
+    def test_cuda_version_returns_none_when_unavailable(self):
+        with patch("dflash.is_vllm_available", return_value=False, create=True):
+            self.assertIsNone(accelerators.dflash_cuda_version())
+
+
+class Wsl2AvailableTests(unittest.TestCase):
+    def test_returns_false_off_windows(self):
+        with patch.object(accelerators.sys, "platform", "linux"):
+            self.assertFalse(accelerators.wsl2_available())
+        with patch.object(accelerators.sys, "platform", "darwin"):
+            self.assertFalse(accelerators.wsl2_available())
+
+    def test_returns_true_when_wsl_status_succeeds(self):
+        fake_result = MagicMock(returncode=0)
+        with patch.object(accelerators.sys, "platform", "win32"):
+            with patch.object(
+                accelerators.subprocess,
+                "run",
+                return_value=fake_result,
+            ) as run_mock:
+                self.assertTrue(accelerators.wsl2_available())
+                run_mock.assert_called_once()
+                self.assertEqual(run_mock.call_args.args[0][0], "wsl")
+                self.assertEqual(run_mock.call_args.args[0][1], "--status")
+
+    def test_returns_false_when_wsl_status_fails(self):
+        fake_result = MagicMock(returncode=1)
+        with patch.object(accelerators.sys, "platform", "win32"):
+            with patch.object(accelerators.subprocess, "run", return_value=fake_result):
+                self.assertFalse(accelerators.wsl2_available())
+
+    def test_returns_false_when_wsl_not_installed(self):
+        with patch.object(accelerators.sys, "platform", "win32"):
+            with patch.object(
+                accelerators.subprocess,
+                "run",
+                side_effect=FileNotFoundError(),
+            ):
+                self.assertFalse(accelerators.wsl2_available())
+
+    def test_returns_false_on_subprocess_timeout(self):
+        with patch.object(accelerators.sys, "platform", "win32"):
+            with patch.object(
+                accelerators.subprocess,
+                "run",
+                side_effect=accelerators.subprocess.TimeoutExpired(cmd="wsl", timeout=2.0),
+            ):
+                self.assertFalse(accelerators.wsl2_available())
+
+
+class BackendCapabilitiesToDictTests(unittest.TestCase):
+    """The frontend reads accelerator flags via ``/api/health``. Pin
+    the serialized payload so a future field rename (or a forgetful
+    ``to_dict`` update) gets caught here rather than in a vague UI bug."""
+
+    def test_to_dict_includes_every_accelerator_field(self):
+        caps = BackendCapabilities(
+            pythonExecutable="/x/python",
+            mlxAvailable=False,
+            mlxLmAvailable=False,
+            mlxUsable=False,
+            nunchakuAvailable=True,
+            nunchakuVersion="1.2.1",
+            sageattentionAvailable=True,
+            sageattentionVersion="2.2.0",
+            dflashMlxAvailable=False,
+            dflashMlxVersion=None,
+            dflashCudaAvailable=True,
+            dflashCudaVersion="0.1.0",
+            triattentionAvailable=True,
+            triattentionVersion="0.2.0",
+            kvpressAvailable=False,
+            kvpressVersion=None,
+            wsl2Available=True,
+        )
+        payload = caps.to_dict()
+        for key in (
+            "nunchakuAvailable",
+            "nunchakuVersion",
+            "sageattentionAvailable",
+            "sageattentionVersion",
+            "dflashMlxAvailable",
+            "dflashMlxVersion",
+            "dflashCudaAvailable",
+            "dflashCudaVersion",
+            "triattentionAvailable",
+            "triattentionVersion",
+            "kvpressAvailable",
+            "kvpressVersion",
+            "wsl2Available",
+        ):
+            self.assertIn(key, payload, f"{key} missing from to_dict payload")
+        self.assertTrue(payload["nunchakuAvailable"])
+        self.assertEqual(payload["sageattentionVersion"], "2.2.0")
+        self.assertFalse(payload["dflashMlxAvailable"])
+        self.assertTrue(payload["wsl2Available"])
+
+    def test_defaults_render_as_false_and_none(self):
+        caps = BackendCapabilities(
+            pythonExecutable="/x/python",
+            mlxAvailable=False,
+            mlxLmAvailable=False,
+            mlxUsable=False,
+        )
+        payload = caps.to_dict()
+        for flag in (
+            "nunchakuAvailable",
+            "sageattentionAvailable",
+            "dflashMlxAvailable",
+            "dflashCudaAvailable",
+            "triattentionAvailable",
+            "kvpressAvailable",
+            "wsl2Available",
+        ):
+            self.assertFalse(payload[flag], f"{flag} should default False")
+        for version in (
+            "nunchakuVersion",
+            "sageattentionVersion",
+            "dflashMlxVersion",
+            "dflashCudaVersion",
+            "triattentionVersion",
+            "kvpressVersion",
+        ):
+            self.assertIsNone(payload[version], f"{version} should default None")
+
+
+if __name__ == "__main__":
+    unittest.main()

From 36ecee94b84dd219387be841f448f72c85573fea Mon Sep 17 00:00:00 2001
From: Cryptopoly <31970407+cryptopoly@users.noreply.github.com>
Date: Sun, 17 May 2026 10:55:19 +0100
Subject: [PATCH 02/15] test: pytest auto-loads installed app extras + runners
 point at app

Tests should exercise the same install users have, not a parallel
.venv install. New tests/conftest.py calls ensure_extras_on_sys_path
at collection time, so pytest tests/ resolves torch / diffusers /
mlx / nunchaku / sageattention / triattention / vllm against the
persistent extras dir at:

  Windows: %LOCALAPPDATA%\ChaosEngineAI\extras\cp{XY}\site-packages
  macOS:   ~/Library/Application Support/ChaosEngineAI/extras/cp{XY}/site-packages
  Linux:   ${XDG_DATA_HOME}/ChaosEngineAI/extras/cp{XY}/site-packages

A torch upgrade landing via the in-app installer is reflected in the
next pytest run automatically; no pip install dance in .venv. On a
fresh CI box without the extras dir the conftest is a silent no-op,
so existing test boxes keep working.

Set CHAOSENGINE_TEST_TRACE_EXTRAS=1 to log which extras path got
loaded for a given run.

Runners (e2e_test_suite.py, cache-strategy-matrix.py) now print an
actionable hint when the backend is not reachable: open the
ChaosEngineAI app, rather than just backend not reachable; aborting.
Both still exit 2/3 respectively so CI gates stay reliable.

Docs (testing/overview.md, testing/e2e-testing.md) updated with the
canonical open-the-app-then-run-tests flow, with the headless dev
backend kept as an advanced option for contributors.
---
 docs/testing/e2e-testing.md      | 22 ++++++++++++
 docs/testing/overview.md         | 56 +++++++++++++++++++++++++++---
 scripts/cache-strategy-matrix.py | 15 +++++++++
 scripts/e2e_test_suite.py        | 23 ++++++++++++-
 tests/conftest.py                | 58 ++++++++++++++++++++++++++++++++
 5 files changed, 168 insertions(+), 6 deletions(-)
 create mode 100644 tests/conftest.py

diff --git a/docs/testing/e2e-testing.md b/docs/testing/e2e-testing.md
index 20c67b3..5344c86 100644
--- a/docs/testing/e2e-testing.md
+++ b/docs/testing/e2e-testing.md
@@ -60,6 +60,24 @@ alarm.
 
 ### Full sweep (every phase, every check)
 
+The canonical run path is **against the installed app**, so the suite
+exercises the same embedded runtime + extras dir that users have:
+
+```bash
+# 1. Open ChaosEngineAI (Tauri shell launches the backend on 8876)
+# 2. From any shell:
+.venv/bin/python scripts/e2e_test_suite.py
+```
+
+The runner errors with an actionable hint if the backend isn't
+reachable (exit code 2). It will not silently fall back to a custom
+dev backend.
+
+#### Headless dev backend (advanced)
+
+For contributors iterating on the suite itself or running it in CI
+without the desktop shell:
+
 ```bash
 # In one shell — keep the backend running for the entire suite
 ./scripts/chaosengine-cli serve
@@ -68,6 +86,10 @@ alarm.
 ./scripts/e2e_test_suite.py
 ```
 
+This works but doesn't exercise the `python-build-standalone` Python
+that ships in the desktop bundle — for release validation, prefer the
+installed-app path.
+
 Wall time depends on hardware and which models are on disk. M-series with
 27B MLX models on hand: 10–25 minutes. Add another 10–20 if Phase 4 / 5
 actually run generation (depends on installed image/video pipelines).
diff --git a/docs/testing/overview.md b/docs/testing/overview.md
index 0aa4810..f18ae38 100644
--- a/docs/testing/overview.md
+++ b/docs/testing/overview.md
@@ -26,21 +26,67 @@ release.
 
 ## Required commands
 
+ChaosEngineAI tests run against **the installed app's runtime** — the
+same torch / diffusers / mlx / nunchaku / etc. wheels users have
+installed via the in-app "Install GPU runtime" + per-feature install
+buttons. No custom dev setup. The flow is:
+
+1. Open the ChaosEngineAI app (the Tauri shell launches the backend
+   on port 8876 and adds the persistent extras dir to its `PYTHONPATH`).
+2. From any shell, run the test suites below.
+
 ```bash
-# Python tests
+# Python tests — auto-loads the app's extras dir via tests/conftest.py
 .venv/bin/python -m pytest tests/ -q
 
-# TypeScript tests
+# TypeScript tests — no backend dependency
 npm test
 
 # Type-check
 npx tsc --noEmit
 
-# E2E smoke
-./scripts/chaosengine-cli serve &  # one shell
-./scripts/e2e_test_suite.py --smoke  # another shell
+# E2E smoke — talks to the running app on 127.0.0.1:8876
+.venv/bin/python scripts/e2e_test_suite.py --smoke
+```
+
+### Why the app's extras, not the dev venv?
+
+The dev `.venv` ships with FastAPI + pytest + huggingface-hub but
+deliberately **without** torch / diffusers / mlx / nunchaku /
+sageattention / triattention / vllm. Those heavy packages live in the
+persistent extras directory at:
+
+- Windows: `%LOCALAPPDATA%\ChaosEngineAI\extras\cp{XY}\site-packages`
+- macOS: `~/Library/Application Support/ChaosEngineAI/extras/cp{XY}/site-packages`
+- Linux: `${XDG_DATA_HOME}/ChaosEngineAI/extras/cp{XY}/site-packages`
+
+`tests/conftest.py` auto-discovers that path at pytest collection time
+and adds it to `sys.path` (via [`ensure_extras_on_sys_path`](https://github.com/cryptopoly/ChaosEngineAI/blob/staging/backend_service/runtime_paths.py)),
+so `import torch` in a test resolves against the same wheel a user
+runs. A torch upgrade landing via the in-app installer is reflected in
+the next `pytest` run automatically — no `pip install` dance required.
+
+Set `CHAOSENGINE_TEST_TRACE_EXTRAS=1` to log which extras path got
+prepended for a given run (useful when debugging "is this test
+hitting the install I think it is?").
+
+### Headless dev backend (advanced)
+
+Contributors who want to run the suite without the Tauri shell open
+can stand up the backend headlessly:
+
+```bash
+# One shell — runs the FastAPI app under the dev venv
+.venv/bin/python -m backend_service.app --port 8876
+
+# OR (gets the embedded runtime via Tauri's stage script)
+npm run tauri:dev
 ```
 
+This works, but won't exercise the exact `python-build-standalone`
+binary the desktop bundle ships — for release-blocking validation,
+prefer the production-app path above.
+
 ## Where the tests live
 
 | Path | What's tested |
diff --git a/scripts/cache-strategy-matrix.py b/scripts/cache-strategy-matrix.py
index f5e2c7d..a0f1ab0 100755
--- a/scripts/cache-strategy-matrix.py
+++ b/scripts/cache-strategy-matrix.py
@@ -473,7 +473,22 @@ def main() -> int:
     try:
         caps = probe_backend(args.port)
     except ConnectionError as exc:
+        # The matrix runner is meant to exercise the installed app's
+        # runtime, the same way ``e2e_test_suite.py`` does. A failure to
+        # reach the backend almost always means "the app isn't open" —
+        # surface that clearly instead of just echoing the ConnectionError.
         print(f"  ! {exc}", file=sys.stderr)
+        print("", file=sys.stderr)
+        print(
+            "Open the ChaosEngineAI app and re-run this command — the matrix "
+            "is designed to exercise the production embedded runtime + extras.",
+            file=sys.stderr,
+        )
+        print(
+            f"(advanced: `npm run tauri:dev` or `python -m backend_service.app "
+            f"--port {args.port}` works for dev runs, but won't match the user-install path)",
+            file=sys.stderr,
+        )
         return 3
     print(f"  available strategies: {sorted(caps.available_strategies)}")
     print(f"  dflash={caps.dflash_available} ddtree={caps.ddtree_available} turbo-binary={caps.has_turbo_binary}")
diff --git a/scripts/e2e_test_suite.py b/scripts/e2e_test_suite.py
index 5c77be8..5c302f1 100755
--- a/scripts/e2e_test_suite.py
+++ b/scripts/e2e_test_suite.py
@@ -844,7 +844,28 @@ def main(argv: list[str] | None = None) -> int:
         phase0 = phase_0(cap)
         phases.append(phase0)
         _write_reports(Path(args.report_dir), started, ended, phases, cap)
-        print("[e2e] backend not reachable; aborting", file=sys.stderr, flush=True)
+        # Comprehensive E2E runs against the installed ChaosEngineAI app,
+        # not a custom dev backend — so the actionable hint always points
+        # at "open the app". The headless dev path is mentioned as a
+        # fallback for contributors who already know it exists.
+        print("", file=sys.stderr, flush=True)
+        print(
+            f"[e2e] backend not reachable at http://{_HOST}:{_PORT}/api/health.",
+            file=sys.stderr,
+            flush=True,
+        )
+        print(
+            "[e2e] open the ChaosEngineAI app and re-run this command — the suite "
+            "exercises the production embedded runtime.",
+            file=sys.stderr,
+            flush=True,
+        )
+        print(
+            "[e2e] (advanced: `npm run tauri:dev` or `python -m backend_service.app "
+            f"--port {_PORT}` from .venv works too, but won't match the user-install path)",
+            file=sys.stderr,
+            flush=True,
+        )
         return 2
 
     phases: list[PhaseResult] = []
diff --git a/tests/conftest.py b/tests/conftest.py
new file mode 100644
index 0000000..a117529
--- /dev/null
+++ b/tests/conftest.py
@@ -0,0 +1,58 @@
+"""pytest collection-time hook to make tests resolve against the
+installed ChaosEngineAI app's extras dir.
+
+The dev ``.venv`` deliberately ships **without** torch / diffusers /
+mlx / vllm / nunchaku / sageattention / triattention. Those packages
+live in the persistent extras directory the desktop app populates via
+``/api/setup/install-gpu-bundle`` and friends — the same path the
+production embedded runtime puts on ``PYTHONPATH`` at backend launch.
+
+Importing torch in the dev venv would fork the install state from
+what real users run. So instead of asking developers to ``pip install``
+into ``.venv``, this conftest reuses the production extras dir at
+collection time so:
+
+  - Tests that touch ``torch`` / ``diffusers`` etc. resolve them
+    against the same wheels the user's actual app uses.
+  - "No custom test setup" — open the app, run ``pytest``, you're
+    testing the production install.
+  - A torch upgrade landing via the in-app installer is reflected in
+    the next pytest run automatically.
+  - CI boxes without the extras dir get a silent no-op: tests that
+    require torch will still fail in the same place they did before
+    (the import line), but tests that don't need it run normally.
+
+The append-vs-prepend decision is delegated to
+``ensure_extras_on_sys_path`` — repo-local shims (notably the
+``turboquant_mlx`` adapter that wraps the upstream
+``turboquant-mlx-full`` install) must keep import authority over the
+raw upstream packages, so the helper appends rather than prepends.
+
+This is a pytest-native conftest, not a fixture. The side effect runs
+once when pytest collects ``tests/``, before any test module imports.
+"""
+
+from __future__ import annotations
+
+import os
+import sys
+
+# We import the helper through ``backend_service`` so the editable
+# install of this repo (``pip install -e .``) is what provides the
+# import path. No special bootstrap needed — pytest's rootdir handling
+# already finds ``backend_service`` via the installed package.
+from backend_service.runtime_paths import ensure_extras_on_sys_path
+
+
+_INSERTED = ensure_extras_on_sys_path()
+
+
+# Surface what we wired in via ``-v -s`` so CI logs and local
+# debugging make it obvious which extras dir the run pulled from.
+# Silent in the default ``-q`` output so it doesn't add noise.
+if _INSERTED and os.environ.get("CHAOSENGINE_TEST_TRACE_EXTRAS"):
+    print(
+        f"[conftest] appended extras to sys.path: {[str(p) for p in _INSERTED]}",
+        file=sys.stderr,
+        flush=True,
+    )

From 1cbb9b14fd606bbe87ff3b97bfe6207945c5fe52 Mon Sep 17 00:00:00 2001
From: Cryptopoly <31970407+cryptopoly@users.noreply.github.com>
Date: Sun, 17 May 2026 11:20:37 +0100
Subject: [PATCH 03/15] feat: AcceleratorCard component + catalog (FU-056 Phase
 2)

Reusable card for the six CUDA-side accelerators (nunchaku,
sageattention, dflash-mlx, dflash-cuda, triattention, kvpress).
Three placement variants share one component so the per-feature
surfaces in Phases 3-6 stay in sync without re-implementing the
three states (idle / installing / installed / failed) per surface:

  - card: full banner with title, claim, applies-to, size pill,
    primary action. Lands in the Image / Video Studio runtime
    banners and the Diagnostics Boost Pack.
  - pill: compact horizontal chip with 4-bit-style copy. Lands on
    catalog variant cards in the Discover / Models tabs.
  - row: table form for Diagnostics Boost Pack's scannable view.

State ownership: parent owns the install lifecycle (which package
is in flight, success/failure, captured pip output). The card
only owns the log-expanded toggle. Mirrors the CudaTorchLogPanel
contract so the card is cheap to render in many places without
duplicating polling work.

New catalog (src/components/acceleratorCatalog.ts) is the single
source of truth for each accelerator's pip name, capability flag,
speedup claim, size, install mode, and platform gate. Adding a
seventh accelerator is one entry here, one Phase 1 capability
flag, and one row in the backend's _INSTALLABLE_PIP_PACKAGES.

NativeBackendStatus (src/types/server.ts) extended with the 13
FU-056 Phase 1 fields plus the older vllm/mtplx/ggufMtp fields
that were already on the wire but missing from the TS interface.
All fields optional so a backend running an older build than the
frontend doesn't break the type contract.

Tests (28 new): catalog shape pinning + getAccelerator lookup +
isPlatformCompatible matrix + readInstalled / readVersion /
platformLabel / actionLabelFor branch coverage. Vitest harness
stays at pure-function level - no React Testing Library yet, per
the existing src/components/__tests__/ convention.

CSS: .accelerator-card / -pill / -row variants in styles.css,
matching the existing .torch-upgrade-pill colour vocabulary
(rgba(80, 140, 220, ...) for the not-installed accent,
rgba(80, 180, 100, ...) for installed, --border + --surface
tokens for the chrome).
---
 src/components/AcceleratorCard.tsx            | 323 ++++++++++++++++++
 .../__tests__/AcceleratorCard.test.tsx        | 119 +++++++
 .../__tests__/acceleratorCatalog.test.ts      | 100 ++++++
 src/components/acceleratorCatalog.ts          | 203 +++++++++++
 src/styles.css                                | 201 +++++++++++
 src/types/server.ts                           |  23 ++
 6 files changed, 969 insertions(+)
 create mode 100644 src/components/AcceleratorCard.tsx
 create mode 100644 src/components/__tests__/AcceleratorCard.test.tsx
 create mode 100644 src/components/__tests__/acceleratorCatalog.test.ts
 create mode 100644 src/components/acceleratorCatalog.ts

diff --git a/src/components/AcceleratorCard.tsx b/src/components/AcceleratorCard.tsx
new file mode 100644
index 0000000..f76e875
--- /dev/null
+++ b/src/components/AcceleratorCard.tsx
@@ -0,0 +1,323 @@
+import { useState } from "react";
+
+import type { NativeBackendStatus } from "../types/server";
+import {
+  type AcceleratorMeta,
+  isPlatformCompatible,
+} from "./acceleratorCatalog";
+
+/**
+ * Reusable card for the six CUDA-side accelerators (FU-056 Phase 2).
+ *
+ * Three placement variants share one component so the per-feature
+ * surfaces in Phases 3–6 stay in sync without re-implementing the
+ * three states (idle / installing / installed / failed) per surface:
+ *
+ *   - ``card`` (default) — full-width banner with title, speedup
+ *     claim, applies-to copy, size pill, primary action. Lives in the
+ *     Diagnostics Boost Pack and the Image / Video Studio runtime
+ *     banners.
+ *   - ``pill`` — compact horizontal chip with "🚀 Label +Nx [Install]"
+ *     copy. Lives on catalog variant cards in the Discover / Models
+ *     tabs.
+ *   - ``row`` — table-friendly form with name + applies-to + status +
+ *     action laid out as columns. Used by the Diagnostics Boost Pack
+ *     to render all six accelerators in one scannable view.
+ *
+ * State ownership: the *parent* owns the install lifecycle (which
+ * package is in flight, success / failure of the most recent attempt,
+ * output captured from the install pipe). The card itself only owns
+ * the "log expanded?" toggle. This mirrors the
+ * ``CudaTorchLogPanel`` / ``TorchUpgradePill`` contract — keeps the
+ * card cheap to render in many places without each instance
+ * duplicating polling work.
+ */
+
+export interface AcceleratorCardProps {
+  /** Catalog row for the accelerator this card represents. */
+  meta: AcceleratorMeta;
+  /** Live capability snapshot. Used to read ``meta.capabilityField``
+   * + ``meta.versionField`` for installed-state display. */
+  capabilities: NativeBackendStatus | null;
+  /** Card layout density. Defaults to ``"card"``. */
+  variant?: "card" | "pill" | "row";
+  /** True while *this specific* accelerator's install is in flight.
+   * The parent owns this state; the card just renders accordingly. */
+  installing?: boolean;
+  /** Last error message from a failed install attempt. ``null`` after
+   * a successful install or before any attempt. */
+  installError?: string | null;
+  /** Captured pip output from the last install (success or fail).
+   * Surfaced inside a collapsible ``<details>`` so success runs stay
+   * compact but failures expose the diagnostic. */
+  installOutput?: string | null;
+  /** Fired when the user clicks Install / Retry. Parent should call
+   * ``installPipPackage(meta.pipPackage)`` then ``refreshWorkspace()``. */
+  onInstall: (pipPackage: string) => void;
+  /** Optional click handler for the platform-mismatch tooltip — lets
+   * the parent surface a "this won't run on your hardware" toast. */
+  onPlatformMismatch?: (meta: AcceleratorMeta) => void;
+  /** Force-show the card even when ``platformGate`` says it's
+   * incompatible. The Diagnostics Boost Pack uses this so users can
+   * see every accelerator; per-feature surfaces leave it false. */
+  showIncompatible?: boolean;
+}
+
+/** Exported for unit-test reach: ``true`` iff capabilities reports
+ * this accelerator's flag as ``=== true``. Older backends without
+ * FU-056 fields read as ``false`` (the fields are optional on the
+ * shared TS interface). Never throws. */
+export function readInstalled(
+  meta: AcceleratorMeta,
+  capabilities: NativeBackendStatus | null,
+): boolean {
+  if (!capabilities) return false;
+  const value = capabilities[meta.capabilityField];
+  return value === true;
+}
+
+/** Exported for unit-test reach: returns the version string when the
+ * backend exposed it, else ``null``. ``"0.0.0"`` and other zero-prefix
+ * versions count as present — we don't filter on semver shape. */
+export function readVersion(
+  meta: AcceleratorMeta,
+  capabilities: NativeBackendStatus | null,
+): string | null {
+  if (!capabilities) return null;
+  const value = capabilities[meta.versionField];
+  return typeof value === "string" && value.length > 0 ? value : null;
+}
+
+/** Exported for unit-test reach: human-readable platform requirement. */
+export function platformLabel(gate: AcceleratorMeta["platformGate"]): string {
+  switch (gate) {
+    case "cuda":
+      return "CUDA only";
+    case "apple-silicon":
+      return "Apple Silicon only";
+    case "any":
+      return "Cross-platform";
+  }
+}
+
+/** Exported for unit-test reach: maps the (installed / installing /
+ * failed / idle, sync / async) matrix onto the button copy. Returns
+ * ``null`` when no action button should render (i.e. the install is
+ * already complete). */
+export function actionLabelFor(args: {
+  installed: boolean;
+  installing: boolean;
+  hasError: boolean;
+  installMode: AcceleratorMeta["installMode"];
+}): string | null {
+  if (args.installed) return null;
+  if (args.installing) return "Installing…";
+  if (args.hasError) return "Retry";
+  return args.installMode === "async" ? "Install (background)" : "Install";
+}
+
+export function AcceleratorCard(props: AcceleratorCardProps) {
+  const {
+    meta,
+    capabilities,
+    variant = "card",
+    installing = false,
+    installError = null,
+    installOutput = null,
+    onInstall,
+    onPlatformMismatch,
+    showIncompatible = false,
+  } = props;
+
+  const installed = readInstalled(meta, capabilities);
+  const version = readVersion(meta, capabilities);
+  const compatible = capabilities ? isPlatformCompatible(meta, capabilities) : true;
+  const [logOpen, setLogOpen] = useState<boolean>(Boolean(installError));
+
+  // When the affordance is shown on a platform that physically can't
+  // run the accelerator and the surface isn't a "show everything"
+  // diagnostic — hide it. Cleaner than rendering a disabled card the
+  // user can't act on.
+  if (!compatible && !showIncompatible) {
+    return null;
+  }
+
+  const handleInstall = () => {
+    if (!compatible) {
+      onPlatformMismatch?.(meta);
+      return;
+    }
+    onInstall(meta.pipPackage);
+  };
+
+  const statusBadge = (() => {
+    if (installed) {
+      return (
+        <span className="accelerator-card-status accelerator-card-status-installed">
+          {version ? `✓ v${version}` : "✓ Installed"}
+        </span>
+      );
+    }
+    if (installing) {
+      return (
+        <span className="accelerator-card-status accelerator-card-status-installing">
+          Installing…
+        </span>
+      );
+    }
+    if (installError) {
+      return (
+        <span className="accelerator-card-status accelerator-card-status-failed">
+          Install failed
+        </span>
+      );
+    }
+    return null;
+  })();
+
+  const actionLabel = actionLabelFor({
+    installed,
+    installing,
+    hasError: Boolean(installError),
+    installMode: meta.installMode,
+  });
+
+  if (variant === "pill") {
+    return (
+      <span
+        className={
+          "accelerator-card accelerator-card-pill" +
+          (installed ? " accelerator-card-installed" : "") +
+          (!compatible ? " accelerator-card-incompatible" : "")
+        }
+        data-accelerator-id={meta.id}
+      >
+        <span className="accelerator-card-pill-label">
+          {installed ? "✓ " : "🚀 "}
+          {meta.shortLabel}
+        </span>
+        {!installed && (
+          <button
+            type="button"
+            className="accelerator-card-action accelerator-card-action-pill"
+            onClick={handleInstall}
+            disabled={installing}
+            aria-label={`Install ${meta.label}`}
+          >
+            {actionLabel}
+          </button>
+        )}
+      </span>
+    );
+  }
+
+  if (variant === "row") {
+    return (
+      <tr
+        className={
+          "accelerator-card-row" +
+          (installed ? " accelerator-card-installed" : "") +
+          (!compatible ? " accelerator-card-incompatible" : "")
+        }
+        data-accelerator-id={meta.id}
+      >
+        <td className="accelerator-card-row-label">
+          <strong>{meta.label}</strong>
+          <span className="accelerator-card-row-applies">{meta.appliesTo}</span>
+        </td>
+        <td className="accelerator-card-row-size">{meta.sizeOnDiskLabel}</td>
+        <td className="accelerator-card-row-platform">{platformLabel(meta.platformGate)}</td>
+        <td className="accelerator-card-row-status">{statusBadge}</td>
+        <td className="accelerator-card-row-action">
+          {actionLabel && (
+            <button
+              type="button"
+              className="accelerator-card-action"
+              onClick={handleInstall}
+              disabled={installing || !compatible}
+              title={!compatible ? `Requires: ${platformLabel(meta.platformGate)}` : undefined}
+            >
+              {actionLabel}
+            </button>
+          )}
+        </td>
+      </tr>
+    );
+  }
+
+  // Default: full card.
+  return (
+    <section
+      className={
+        "accelerator-card" +
+        (installed ? " accelerator-card-installed" : "") +
+        (!compatible ? " accelerator-card-incompatible" : "")
+      }
+      data-accelerator-id={meta.id}
+    >
+      <header className="accelerator-card-header">
+        <h3 className="accelerator-card-title">
+          {installed ? "✓ " : "🚀 "}
+          {meta.label}
+        </h3>
+        {statusBadge}
+      </header>
+
+      <p className="accelerator-card-claim">{meta.speedupClaim}</p>
+      <p className="accelerator-card-applies">
+        <span className="accelerator-card-applies-label">Applies to:</span>{" "}
+        {meta.appliesTo}
+      </p>
+
+      <div className="accelerator-card-meta">
+        <span className="accelerator-card-meta-item">{meta.sizeOnDiskLabel}</span>
+        <span className="accelerator-card-meta-item">{platformLabel(meta.platformGate)}</span>
+        <span className="accelerator-card-meta-item">
+          {meta.installMode === "async" ? "Background install" : "Quick install"}
+        </span>
+        <span className="accelerator-card-meta-item accelerator-card-meta-follow-up">
+          {meta.followUp}
+        </span>
+      </div>
+
+      {actionLabel && (
+        <div className="accelerator-card-actions">
+          <button
+            type="button"
+            className="accelerator-card-action accelerator-card-action-primary"
+            onClick={handleInstall}
+            disabled={installing || !compatible}
+            title={!compatible ? `Requires: ${platformLabel(meta.platformGate)}` : undefined}
+          >
+            {actionLabel}
+          </button>
+        </div>
+      )}
+
+      {installError && (
+        <details
+          className="accelerator-card-log"
+          open={logOpen}
+          onToggle={(event) => setLogOpen((event.target as HTMLDetailsElement).open)}
+        >
+          <summary className="accelerator-card-log-summary">
+            Install failure — show output
+          </summary>
+          <p className="accelerator-card-log-error">{installError}</p>
+          {installOutput && (
+            <pre className="accelerator-card-log-output">{installOutput}</pre>
+          )}
+        </details>
+      )}
+
+      {installed && installOutput && !installError && (
+        <details className="accelerator-card-log accelerator-card-log-success">
+          <summary className="accelerator-card-log-summary">
+            Install output
+          </summary>
+          <pre className="accelerator-card-log-output">{installOutput}</pre>
+        </details>
+      )}
+    </section>
+  );
+}
diff --git a/src/components/__tests__/AcceleratorCard.test.tsx b/src/components/__tests__/AcceleratorCard.test.tsx
new file mode 100644
index 0000000..f7da04c
--- /dev/null
+++ b/src/components/__tests__/AcceleratorCard.test.tsx
@@ -0,0 +1,119 @@
+import { describe, expect, it } from "vitest";
+
+import type { NativeBackendStatus } from "../../types/server";
+import {
+  actionLabelFor,
+  platformLabel,
+  readInstalled,
+  readVersion,
+} from "../AcceleratorCard";
+import { ACCELERATOR_CATALOG, getAccelerator } from "../acceleratorCatalog";
+
+/**
+ * No JSX render harness in the repo today (per
+ * src/components/__tests__/ErrorBoundary.test.ts comment). We pin the
+ * card's *pure-function* contract instead — the same helpers the
+ * component body calls, exported for direct test reach.
+ */
+
+function makeCaps(overrides: Partial<NativeBackendStatus> = {}): NativeBackendStatus {
+  return {
+    pythonExecutable: "/x/python",
+    mlxAvailable: false,
+    mlxLmAvailable: false,
+    mlxUsable: false,
+    ggufAvailable: false,
+    converterAvailable: false,
+    ...overrides,
+  };
+}
+
+describe("readInstalled", () => {
+  const nunchaku = getAccelerator("nunchaku")!;
+
+  it("returns false when capabilities is null", () => {
+    expect(readInstalled(nunchaku, null)).toBe(false);
+  });
+
+  it("returns false when the field is missing (older backend)", () => {
+    expect(readInstalled(nunchaku, makeCaps())).toBe(false);
+  });
+
+  it("returns true when the capability field is true", () => {
+    expect(readInstalled(nunchaku, makeCaps({ nunchakuAvailable: true }))).toBe(true);
+  });
+
+  it("returns false when the capability field is explicitly false", () => {
+    expect(readInstalled(nunchaku, makeCaps({ nunchakuAvailable: false }))).toBe(false);
+  });
+});
+
+describe("readVersion", () => {
+  const nunchaku = getAccelerator("nunchaku")!;
+
+  it("returns null when capabilities is null", () => {
+    expect(readVersion(nunchaku, null)).toBeNull();
+  });
+
+  it("returns null when the version field is missing or empty", () => {
+    expect(readVersion(nunchaku, makeCaps())).toBeNull();
+    expect(readVersion(nunchaku, makeCaps({ nunchakuVersion: "" }))).toBeNull();
+  });
+
+  it("returns the version string when present", () => {
+    expect(readVersion(nunchaku, makeCaps({ nunchakuVersion: "1.2.1" }))).toBe("1.2.1");
+  });
+
+  it("returns null when the version is explicitly null", () => {
+    expect(readVersion(nunchaku, makeCaps({ nunchakuVersion: null }))).toBeNull();
+  });
+});
+
+describe("platformLabel", () => {
+  it("maps every gate to a human-readable string", () => {
+    expect(platformLabel("cuda")).toBe("CUDA only");
+    expect(platformLabel("apple-silicon")).toBe("Apple Silicon only");
+    expect(platformLabel("any")).toBe("Cross-platform");
+  });
+
+  it("covers every catalog platformGate value", () => {
+    // Pins that every catalog entry uses a gate platformLabel knows
+    // how to render — a new gate value would force this test to fail.
+    for (const entry of ACCELERATOR_CATALOG) {
+      const label = platformLabel(entry.platformGate);
+      expect(label.length).toBeGreaterThan(0);
+    }
+  });
+});
+
+describe("actionLabelFor", () => {
+  it("returns null when already installed (no button rendered)", () => {
+    expect(
+      actionLabelFor({ installed: true, installing: false, hasError: false, installMode: "sync" }),
+    ).toBeNull();
+  });
+
+  it("returns ``Installing…`` mid-flight (overrides error)", () => {
+    expect(
+      actionLabelFor({ installed: false, installing: true, hasError: true, installMode: "sync" }),
+    ).toBe("Installing…");
+  });
+
+  it("returns ``Retry`` after a failed attempt", () => {
+    expect(
+      actionLabelFor({ installed: false, installing: false, hasError: true, installMode: "sync" }),
+    ).toBe("Retry");
+  });
+
+  it("returns ``Install`` for fresh sync installs", () => {
+    expect(
+      actionLabelFor({ installed: false, installing: false, hasError: false, installMode: "sync" }),
+    ).toBe("Install");
+  });
+
+  it("returns ``Install (background)`` for async installs", () => {
+    expect(
+      actionLabelFor({ installed: false, installing: false, hasError: false, installMode: "async" }),
+    ).toBe("Install (background)");
+  });
+});
diff --git a/src/components/__tests__/acceleratorCatalog.test.ts b/src/components/__tests__/acceleratorCatalog.test.ts
new file mode 100644
index 0000000..bdfa5be
--- /dev/null
+++ b/src/components/__tests__/acceleratorCatalog.test.ts
@@ -0,0 +1,100 @@
+import { describe, expect, it } from "vitest";
+
+import type { NativeBackendStatus } from "../../types/server";
+import {
+  ACCELERATOR_CATALOG,
+  type AcceleratorMeta,
+  getAccelerator,
+  isPlatformCompatible,
+} from "../acceleratorCatalog";
+
+/**
+ * The catalog is the source of truth for "which accelerators exist".
+ * Tests pin its shape so a typo in a pip-package name, a missing
+ * capability field, or a stale entry can't ship silently — every
+ * downstream surface (Phase 3-6) reads this registry verbatim.
+ */
+
+describe("ACCELERATOR_CATALOG", () => {
+  it("ships exactly the six accelerators FU-056 Phase 1 wired probes for", () => {
+    const ids = ACCELERATOR_CATALOG.map((entry) => entry.id).sort();
+    expect(ids).toEqual([
+      "dflash-cuda",
+      "dflash-mlx",
+      "kvpress",
+      "nunchaku",
+      "sageattention",
+      "triattention",
+    ]);
+  });
+
+  it.each(ACCELERATOR_CATALOG.map((entry) => [entry.id, entry]))(
+    "%s catalog entry has all required fields",
+    (_id, entry) => {
+      expect(entry.label.length).toBeGreaterThan(0);
+      expect(entry.shortLabel.length).toBeGreaterThan(0);
+      expect(entry.pipPackage.length).toBeGreaterThan(0);
+      expect(entry.capabilityField.length).toBeGreaterThan(0);
+      expect(entry.versionField.length).toBeGreaterThan(0);
+      expect(entry.speedupClaim.length).toBeGreaterThan(0);
+      expect(entry.appliesTo.length).toBeGreaterThan(0);
+      expect(entry.sizeOnDiskLabel.length).toBeGreaterThan(0);
+      expect(["sync", "async"]).toContain(entry.installMode);
+      expect(["cuda", "apple-silicon", "any"]).toContain(entry.platformGate);
+      // FU row reference must look like "FU-NNN" (followUp string can
+      // pair multiple FUs separated by "/", e.g. "FU-003 / FU-002").
+      expect(entry.followUp).toMatch(/FU-\d{3}/);
+    },
+  );
+
+  it("capability field names follow the Phase 1 ``*Available`` convention", () => {
+    for (const entry of ACCELERATOR_CATALOG) {
+      expect(entry.capabilityField).toMatch(/Available$/);
+      expect(entry.versionField).toMatch(/Version$/);
+    }
+  });
+
+  it("getAccelerator resolves known ids", () => {
+    expect(getAccelerator("nunchaku")?.label).toBe("Nunchaku");
+    expect(getAccelerator("sageattention")?.label).toBe("SageAttention");
+    expect(getAccelerator("dflash-cuda")?.platformGate).toBe("cuda");
+    expect(getAccelerator("dflash-mlx")?.platformGate).toBe("apple-silicon");
+  });
+
+  it("getAccelerator returns undefined for unknown ids", () => {
+    expect(getAccelerator("flash-attn-3")).toBeUndefined();
+    expect(getAccelerator("")).toBeUndefined();
+  });
+});
+
+describe("isPlatformCompatible", () => {
+  const cudaCaps = { mlxAvailable: false } as Pick<
+    NativeBackendStatus,
+    "mlxAvailable"
+  >;
+  const mlxCaps = { mlxAvailable: true } as Pick<
+    NativeBackendStatus,
+    "mlxAvailable"
+  >;
+
+  it("``any`` platform-gated entries are always compatible", () => {
+    const fake: AcceleratorMeta = {
+      ...ACCELERATOR_CATALOG[0],
+      platformGate: "any",
+    };
+    expect(isPlatformCompatible(fake, cudaCaps)).toBe(true);
+    expect(isPlatformCompatible(fake, mlxCaps)).toBe(true);
+  });
+
+  it("``cuda`` entries match when mlx is unavailable", () => {
+    const nunchaku = ACCELERATOR_CATALOG.find((e) => e.id === "nunchaku")!;
+    expect(isPlatformCompatible(nunchaku, cudaCaps)).toBe(true);
+    expect(isPlatformCompatible(nunchaku, mlxCaps)).toBe(false);
+  });
+
+  it("``apple-silicon`` entries match when mlx is available", () => {
+    const dflashMlx = ACCELERATOR_CATALOG.find((e) => e.id === "dflash-mlx")!;
+    expect(isPlatformCompatible(dflashMlx, cudaCaps)).toBe(false);
+    expect(isPlatformCompatible(dflashMlx, mlxCaps)).toBe(true);
+  });
+});
diff --git a/src/components/acceleratorCatalog.ts b/src/components/acceleratorCatalog.ts
new file mode 100644
index 0000000..1e35dde
--- /dev/null
+++ b/src/components/acceleratorCatalog.ts
@@ -0,0 +1,203 @@
+/**
+ * Accelerator registry (FU-056 Phase 2).
+ *
+ * Source of truth for the six CUDA-side accelerators the in-app install
+ * UX surfaces. Each entry pairs a stable ``id`` with the metadata that
+ * downstream components need to render a "Recommended" badge, an
+ * Install button, an "Installed ✓" pill, or a Boost Pack row:
+ *
+ *   - ``pipPackage`` — argument to ``POST /api/setup/install-package``.
+ *     Must match a key in the backend's ``_INSTALLABLE_PIP_PACKAGES``
+ *     allow-list ([backend_service/routes/setup/__init__.py]).
+ *   - ``capabilityField`` / ``versionField`` — the ``NativeBackendStatus``
+ *     keys to read for installed state + display version. Wired in
+ *     FU-056 Phase 1 on the backend.
+ *   - ``speedupClaim`` / ``appliesTo`` — copy for the "🚀 Nunchaku +3×
+ *     available" pill. Marketing-honest: never claim more than the
+ *     model card / upstream benchmark reports for the *typical* case
+ *     a user will hit.
+ *   - ``sizeOnDiskLabel`` — rough human-readable on-disk footprint
+ *     (compressed download + extracted wheel). Sourced from the
+ *     CLAUDE.md FU rows that registered each package.
+ *   - ``installMode`` — ``"sync"`` for ~5 min installs that we can hold
+ *     a single HTTP call open for; ``"async"`` for the >5 min builds
+ *     (triattention compiles flash-attn from source; vLLM ships a
+ *     ~2 GB wheel) that need the background-job + poll-status shape.
+ *   - ``platformGate`` — when set, the affordance hides on platforms
+ *     where the accelerator can't run at all (e.g. dflash-mlx on
+ *     Windows, vLLM native on macOS). Diagnostic surfaces that show
+ *     "everything" can override this to render a disabled row with
+ *     an explanation.
+ *
+ * Adding a 7th accelerator is one entry here + one Phase 1 capability
+ * flag + one row in ``_INSTALLABLE_PIP_PACKAGES``. No component edit.
+ */
+
+import type { NativeBackendStatus } from "../types/server";
+
+export type AcceleratorId =
+  | "nunchaku"
+  | "sageattention"
+  | "dflash-mlx"
+  | "dflash-cuda"
+  | "triattention"
+  | "kvpress";
+
+export type PlatformGate = "cuda" | "apple-silicon" | "any";
+
+export interface AcceleratorMeta {
+  id: AcceleratorId;
+  /** Human-readable label shown in cards + Boost Pack rows. */
+  label: string;
+  /** Short noun phrase suitable for a pill: "4-bit FLUX/SD3" not "Adds 4-bit support". */
+  shortLabel: string;
+  /** Pip name as it appears in ``_INSTALLABLE_PIP_PACKAGES``. */
+  pipPackage: string;
+  /** Capability flag on ``NativeBackendStatus`` (FU-056 Phase 1). */
+  capabilityField: keyof NativeBackendStatus;
+  /** Version string field (may be ``null`` when installed without a __version__). */
+  versionField: keyof NativeBackendStatus;
+  /** One-line copy explaining the speedup. Used in the "🚀 X available" pill. */
+  speedupClaim: string;
+  /** Models / pipelines this accelerator applies to. Free-text — humans read this. */
+  appliesTo: string;
+  /** Rough on-disk footprint label, e.g. "~50 MB". */
+  sizeOnDiskLabel: string;
+  /** ``sync`` = one HTTP call held open; ``async`` = background job + status poll. */
+  installMode: "sync" | "async";
+  /** Platforms where this can actually run. Affordances hide on the wrong platform. */
+  platformGate: PlatformGate;
+  /** FU row in CLAUDE.md that registered or owns this accelerator. For provenance. */
+  followUp: string;
+  /** Optional doc link slug under ``docs/features/`` for a "Learn more" affordance. */
+  docsSlug?: string;
+}
+
+export const ACCELERATOR_CATALOG: ReadonlyArray<AcceleratorMeta> = [
+  {
+    id: "nunchaku",
+    label: "Nunchaku",
+    shortLabel: "SVDQuant 4-bit",
+    pipPackage: "nunchaku",
+    capabilityField: "nunchakuAvailable",
+    versionField: "nunchakuVersion",
+    speedupClaim: "≈3× faster FLUX/SD3.5/Qwen-Image on CUDA",
+    appliesTo: "FLUX.1, SD3.5, Qwen-Image, SANA, PixArt-Σ",
+    sizeOnDiskLabel: "~50 MB",
+    installMode: "sync",
+    platformGate: "cuda",
+    followUp: "FU-023",
+  },
+  {
+    id: "sageattention",
+    label: "SageAttention",
+    shortLabel: "Fast attention DiT",
+    pipPackage: "sageattention",
+    capabilityField: "sageattentionAvailable",
+    versionField: "sageattentionVersion",
+    speedupClaim: "Stacks with FBCache for ~1.4× extra on DiT pipelines",
+    appliesTo: "Any CUDA DiT image / video pipeline",
+    sizeOnDiskLabel: "~30 MB",
+    installMode: "sync",
+    platformGate: "cuda",
+    followUp: "FU-016",
+  },
+  {
+    id: "dflash-mlx",
+    label: "DFlash (MLX)",
+    shortLabel: "Speculative decoding",
+    pipPackage: "dflash-mlx",
+    capabilityField: "dflashMlxAvailable",
+    versionField: "dflashMlxVersion",
+    speedupClaim: "≈1.5-2× tokens/sec on Qwen3.x and DeepSeek chat models",
+    appliesTo: "Apple Silicon — any LLM with a registered draft model",
+    sizeOnDiskLabel: "~80 MB",
+    installMode: "sync",
+    platformGate: "apple-silicon",
+    followUp: "FU-031",
+    docsSlug: "dflash",
+  },
+  {
+    id: "dflash-cuda",
+    label: "DFlash (CUDA)",
+    shortLabel: "Speculative decoding",
+    pipPackage: "dflash",
+    capabilityField: "dflashCudaAvailable",
+    versionField: "dflashCudaVersion",
+    speedupClaim: "≈1.5-2× tokens/sec on Qwen3.x and DeepSeek chat models",
+    appliesTo: "CUDA — any LLM with a registered draft model",
+    sizeOnDiskLabel: "~80 MB",
+    installMode: "sync",
+    platformGate: "cuda",
+    followUp: "FU-048",
+    docsSlug: "dflash",
+  },
+  {
+    id: "triattention",
+    label: "TriAttention",
+    shortLabel: "KV compressor + LongLive",
+    // The full pip git+url is resolved server-side by the install-package
+    // registry — the client only needs the package name as the registry
+    // key. Keeps the catalog readable + avoids leaking the upstream pin
+    // into the frontend bundle.
+    pipPackage: "triattention",
+    capabilityField: "triattentionAvailable",
+    versionField: "triattentionVersion",
+    speedupClaim: "Real-time long Wan video + 2-3× KV compression on long-context LLMs",
+    appliesTo: "Wan 2.1 1.3B (LongLive), long-context chat models",
+    sizeOnDiskLabel: "~2 GB (pulls vllm)",
+    installMode: "async",
+    platformGate: "cuda",
+    followUp: "FU-003 / FU-002",
+  },
+  {
+    id: "kvpress",
+    label: "kvpress",
+    shortLabel: "KV cache compression",
+    pipPackage: "kvpress",
+    capabilityField: "kvpressAvailable",
+    versionField: "kvpressVersion",
+    speedupClaim: "8-32× KV-cache compression on long-context CUDA inference",
+    appliesTo: "CUDA — any HF transformer with KV cache",
+    sizeOnDiskLabel: "~40 MB",
+    installMode: "sync",
+    platformGate: "cuda",
+    followUp: "FU-027",
+  },
+];
+
+/** Lookup an entry by id. Returns ``undefined`` for unknown ids so the
+ * caller can render a "missing catalog row" diagnostic rather than
+ * crashing — relevant for forward-compat when a backend probe lists a
+ * new accelerator the frontend doesn't know about yet. */
+export function getAccelerator(id: string): AcceleratorMeta | undefined {
+  return ACCELERATOR_CATALOG.find((entry) => entry.id === id);
+}
+
+/** True when this accelerator's ``platformGate`` is satisfied by the
+ * current ``NativeBackendStatus``. The caller can use this to hide
+ * irrelevant cards (e.g. dflash-mlx on Windows) or to dim them with an
+ * explanation tooltip. ``any`` always satisfies. */
+export function isPlatformCompatible(
+  meta: AcceleratorMeta,
+  capabilities: Pick<NativeBackendStatus, "mlxAvailable">,
+): boolean {
+  switch (meta.platformGate) {
+    case "any":
+      return true;
+    case "apple-silicon":
+      // ``mlxAvailable`` is the strongest signal we have for "this is an
+      // Apple Silicon box where MLX worker subprocesses can spawn".
+      // ``platform.system() === "Darwin"`` would catch Intel Macs too, but
+      // none of the MLX-side accelerators run on Intel anyway, so MLX
+      // availability is the better gate.
+      return Boolean(capabilities.mlxAvailable);
+    case "cuda":
+      // We don't have a single ``cudaAvailable`` capability flag today
+      // (the vllm probe carries it implicitly). For Phase 2 we approximate
+      // "this is a CUDA box" with "MLX is NOT available" — i.e. not an
+      // Apple Silicon box. A more precise probe lands in Phase 8 alongside
+      // the WSL bridge work, when we surface ``cudaAvailable`` explicitly.
+      return !capabilities.mlxAvailable;
+  }
+}
diff --git a/src/styles.css b/src/styles.css
index 5cc83bc..10b6647 100644
--- a/src/styles.css
+++ b/src/styles.css
@@ -7782,6 +7782,207 @@ select.text-input {
   overflow: auto;
 }
 
+/* -- FU-056 accelerator card --------------------------------------- */
+.accelerator-card {
+  display: flex;
+  flex-direction: column;
+  gap: 8px;
+  padding: 12px 14px;
+  background: var(--surface);
+  border: 1px solid var(--border);
+  border-radius: var(--radius-md);
+  font-size: 0.88rem;
+}
+.accelerator-card.accelerator-card-installed {
+  background: rgba(80, 180, 100, 0.06);
+  border-color: rgba(80, 180, 100, 0.32);
+}
+.accelerator-card.accelerator-card-incompatible {
+  opacity: 0.55;
+}
+.accelerator-card-header {
+  display: flex;
+  align-items: center;
+  justify-content: space-between;
+  gap: 10px;
+}
+.accelerator-card-title {
+  margin: 0;
+  font-size: 0.95rem;
+  font-weight: 600;
+}
+.accelerator-card-claim {
+  margin: 0;
+  color: var(--muted-strong);
+  font-size: 0.86rem;
+}
+.accelerator-card-applies {
+  margin: 0;
+  color: var(--muted);
+  font-size: 0.82rem;
+}
+.accelerator-card-applies-label {
+  color: var(--muted-strong);
+  font-weight: 500;
+}
+.accelerator-card-meta {
+  display: flex;
+  flex-wrap: wrap;
+  gap: 6px 10px;
+  font-size: 0.78rem;
+  color: var(--muted);
+  margin-top: 2px;
+}
+.accelerator-card-meta-item {
+  display: inline-flex;
+  align-items: center;
+  padding: 2px 7px;
+  background: rgba(255, 255, 255, 0.04);
+  border: 1px solid rgba(255, 255, 255, 0.06);
+  border-radius: 999px;
+  font-size: 0.74rem;
+}
+.accelerator-card-meta-follow-up {
+  font-family: ui-monospace, SFMono-Regular, Menlo, Consolas, monospace;
+  color: var(--accent);
+}
+.accelerator-card-status {
+  display: inline-block;
+  padding: 2px 8px;
+  font-size: 0.72rem;
+  font-weight: 600;
+  border-radius: 10px;
+  letter-spacing: 0.02em;
+}
+.accelerator-card-status-installed {
+  background: rgba(80, 180, 100, 0.22);
+  color: #8fd99e;
+}
+.accelerator-card-status-installing {
+  background: rgba(220, 170, 80, 0.22);
+  color: #e7c382;
+}
+.accelerator-card-status-failed {
+  background: rgba(220, 100, 100, 0.25);
+  color: #ec9c9c;
+}
+.accelerator-card-actions {
+  display: flex;
+  flex-wrap: wrap;
+  gap: 8px;
+  align-items: center;
+  margin-top: 4px;
+}
+.accelerator-card-action {
+  appearance: none;
+  border: 1px solid rgba(143, 180, 255, 0.45);
+  background: rgba(80, 140, 220, 0.18);
+  color: var(--accent-strong);
+  border-radius: 8px;
+  padding: 5px 12px;
+  font-size: 0.84rem;
+  font-weight: 500;
+  cursor: pointer;
+  transition: background 0.12s ease;
+}
+.accelerator-card-action:hover:not(:disabled) {
+  background: rgba(80, 140, 220, 0.30);
+}
+.accelerator-card-action:disabled {
+  opacity: 0.55;
+  cursor: not-allowed;
+}
+.accelerator-card-action-primary {
+  font-weight: 600;
+}
+.accelerator-card-log {
+  margin-top: 4px;
+  font-size: 0.82rem;
+}
+.accelerator-card-log-summary {
+  cursor: pointer;
+  color: var(--muted-strong);
+}
+.accelerator-card-log-error {
+  margin: 6px 0 4px 0;
+  color: #ec9c9c;
+  font-size: 0.82rem;
+}
+.accelerator-card-log-output {
+  margin: 0;
+  padding: 8px 10px;
+  background: rgba(0, 0, 0, 0.35);
+  color: rgba(226, 232, 240, 0.92);
+  border-radius: 6px;
+  font-family: ui-monospace, SFMono-Regular, Menlo, Consolas, monospace;
+  font-size: 0.74rem;
+  white-space: pre-wrap;
+  overflow-wrap: anywhere;
+  max-height: 240px;
+  overflow: auto;
+}
+
+/* Compact pill variant — for catalog variant cards (Phase 3 / 4). */
+.accelerator-card-pill {
+  display: inline-flex;
+  align-items: center;
+  gap: 6px;
+  padding: 3px 8px 3px 9px;
+  background: rgba(80, 140, 220, 0.12);
+  border: 1px solid rgba(80, 140, 220, 0.32);
+  border-radius: 999px;
+  font-size: 0.78rem;
+  flex-direction: row;
+}
+.accelerator-card-pill.accelerator-card-installed {
+  background: rgba(80, 180, 100, 0.10);
+  border-color: rgba(80, 180, 100, 0.32);
+}
+.accelerator-card-pill-label {
+  font-weight: 500;
+  color: var(--text);
+}
+.accelerator-card-action-pill {
+  padding: 1px 8px;
+  font-size: 0.72rem;
+  border-radius: 999px;
+}
+
+/* Table-row variant — for the Diagnostics Boost Pack (Phase 6). */
+tr.accelerator-card-row {
+  border-bottom: 1px solid rgba(255, 255, 255, 0.04);
+}
+tr.accelerator-card-row.accelerator-card-installed {
+  background: rgba(80, 180, 100, 0.04);
+}
+tr.accelerator-card-row.accelerator-card-incompatible {
+  opacity: 0.55;
+}
+.accelerator-card-row-label {
+  padding: 8px 10px;
+  display: flex;
+  flex-direction: column;
+  gap: 2px;
+}
+.accelerator-card-row-label strong {
+  font-weight: 600;
+}
+.accelerator-card-row-applies {
+  color: var(--muted);
+  font-size: 0.78rem;
+}
+.accelerator-card-row-size,
+.accelerator-card-row-platform {
+  padding: 8px 10px;
+  color: var(--muted);
+  font-size: 0.82rem;
+}
+.accelerator-card-row-status,
+.accelerator-card-row-action {
+  padding: 8px 10px;
+  text-align: right;
+}
+
 /* -- Diagnostics panel --------------------------------------------- */
 .diagnostics-body {
   display: flex;
diff --git a/src/types/server.ts b/src/types/server.ts
index d5009d4..3d1c1e5 100644
--- a/src/types/server.ts
+++ b/src/types/server.ts
@@ -122,5 +122,28 @@ export interface NativeBackendStatus {
   llamaServerPath?: string | null;
   llamaServerTurboPath?: string | null;
   converterAvailable: boolean;
+  // FU-047 + downstream — already on the wire, kept optional so older
+  // backends without these fields don't break the TS contract.
+  vllmAvailable?: boolean;
+  vllmVersion?: string | null;
+  mtplxAvailable?: boolean;
+  mtplxPythonPath?: string | null;
+  ggufMtpAvailable?: boolean;
+  // FU-056 Phase 1 — per-accelerator import probes. Optional so a
+  // backend running an older build than the frontend doesn't crash the
+  // capability-readers; consumers should treat missing as ``false``.
+  nunchakuAvailable?: boolean;
+  nunchakuVersion?: string | null;
+  sageattentionAvailable?: boolean;
+  sageattentionVersion?: string | null;
+  dflashMlxAvailable?: boolean;
+  dflashMlxVersion?: string | null;
+  dflashCudaAvailable?: boolean;
+  dflashCudaVersion?: string | null;
+  triattentionAvailable?: boolean;
+  triattentionVersion?: string | null;
+  kvpressAvailable?: boolean;
+  kvpressVersion?: string | null;
+  wsl2Available?: boolean;
   probing?: boolean;
 }

From b00b40991fcfdd8164125912c1565ac49dd74270 Mon Sep 17 00:00:00 2001
From: Cryptopoly <31970407+cryptopoly@users.noreply.github.com>
Date: Sun, 17 May 2026 11:42:34 +0100
Subject: [PATCH 04/15] feat: Diagnostics Boost Pack panel (FU-056 Phase 6)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

First end-to-end UX slice for FU-056. The Diagnostics tab gains a
Boost Pack section listing all six CUDA-side accelerators (nunchaku,
sageattention, dflash-mlx, dflash-cuda, triattention, kvpress) as a
single scannable table. Status pill + Install / Retry button per
row; click installs via the existing POST /api/setup/install-package
endpoint, output captured into a collapsible details, then
capabilities re-probe so the "Installed v1.2.1" pill flips without
a parent refetch.

Self-probes capabilities on mount via refreshCapabilities() so the
panel works standalone — DiagnosticsPanel only passes backendOnline.
Per-accelerator install state lives in a record keyed by pip name,
so multiple installs can run concurrently if the user is impatient
(the backend serialises pip writes at the OS-FS layer).

Renders every catalog row with showIncompatible=true: this is the
"see everything" surface, not a per-feature gate. Apple-Silicon and
CUDA accelerators both list; the platform column tells the user
which apply to their box, and disabled state + tooltip blocks an
ill-fitting install. Phases 3-5 will filter per surface.

Closes the first observable loop:
  Phase 1 probe → Phase 2 card (row variant) → install → re-probe
  → installed state. Same Component renders pill + card + row, so
  the per-feature surfaces in Phases 3-5 ride the same diff.

No new tests — the pure logic (readInstalled, readVersion,
actionLabelFor, platformLabel, isPlatformCompatible) is already
pinned by Phase 2's 28 unit tests. The Boost Pack itself is wiring:
fetch capabilities, dispatch install, re-fetch on success. Mirrors
the existing CudaTorchLogPanel pattern.
---
 .../settings/AcceleratorsBoostPack.tsx        | 197 ++++++++++++++++++
 src/features/settings/DiagnosticsPanel.tsx    |   2 +
 2 files changed, 199 insertions(+)
 create mode 100644 src/features/settings/AcceleratorsBoostPack.tsx

diff --git a/src/features/settings/AcceleratorsBoostPack.tsx b/src/features/settings/AcceleratorsBoostPack.tsx
new file mode 100644
index 0000000..daaaea5
--- /dev/null
+++ b/src/features/settings/AcceleratorsBoostPack.tsx
@@ -0,0 +1,197 @@
+import { useCallback, useEffect, useState } from "react";
+
+import { AcceleratorCard } from "../../components/AcceleratorCard";
+import { ACCELERATOR_CATALOG } from "../../components/acceleratorCatalog";
+import { installPipPackage, refreshCapabilities } from "../../api";
+import type { NativeBackendStatus } from "../../types/server";
+
+/**
+ * The Diagnostics tab's "Boost Pack" section (FU-056 Phase 6).
+ *
+ * Single panel listing every CUDA-side accelerator the catalog
+ * registers, with current install state + one-click install. The
+ * "everything in one place" surface for users who want to see the
+ * full accelerator landscape; per-feature surfaces (Phases 3-5)
+ * inherit the same ``AcceleratorCard`` component but show only the
+ * accelerators relevant to that tab.
+ *
+ * State ownership
+ * ---------------
+ * This panel self-probes capabilities on mount (``refreshCapabilities``
+ * hits ``/api/setup/refresh-capabilities``) and re-probes after each
+ * successful install so the Installed pills flip without a parent
+ * refetch. Per-accelerator install state lives in ``installStates``
+ * keyed by ``pipPackage`` — the card itself stays stateless beyond
+ * its "log expanded" toggle.
+ *
+ * The panel intentionally renders **every** entry in
+ * ``ACCELERATOR_CATALOG`` regardless of platform (``showIncompatible``
+ * is true). The user-experience choice here: this is the diagnostics
+ * surface, the user wants visibility into what exists across the
+ * ecosystem, not just what their current box can run. Per-feature
+ * surfaces will gate by platform so wrong-platform affordances don't
+ * appear next to a FLUX model card.
+ */
+
+export interface AcceleratorsBoostPackProps {
+  /** Set false until the backend health check has cleared.
+   * Capabilities fetch needs the backend up. */
+  backendOnline: boolean;
+}
+
+interface InstallState {
+  installing: boolean;
+  error: string | null;
+  output: string | null;
+}
+
+const EMPTY_INSTALL_STATE: InstallState = {
+  installing: false,
+  error: null,
+  output: null,
+};
+
+export function AcceleratorsBoostPack({ backendOnline }: AcceleratorsBoostPackProps) {
+  const [capabilities, setCapabilities] = useState<NativeBackendStatus | null>(null);
+  const [capError, setCapError] = useState<string | null>(null);
+  const [installStates, setInstallStates] = useState<Record<string, InstallState>>({});
+
+  const probe = useCallback(async () => {
+    if (!backendOnline) return;
+    try {
+      const next = await refreshCapabilities();
+      // ``refreshCapabilities`` returns a generic ``Record`` because
+      // it serves several consumers; the FU-056 Phase 1 fields are
+      // optional on ``NativeBackendStatus`` so this cast is safe even
+      // when the backend is older than the frontend.
+      setCapabilities(next as unknown as NativeBackendStatus);
+      setCapError(null);
+    } catch (err) {
+      const message = err instanceof Error ? err.message : String(err);
+      setCapError(message);
+    }
+  }, [backendOnline]);
+
+  useEffect(() => {
+    if (backendOnline) {
+      void probe();
+    }
+  }, [backendOnline, probe]);
+
+  const handleInstall = useCallback(
+    async (pipPackage: string) => {
+      // Guard against double-clicks on the same accelerator. Other
+      // accelerators can still install concurrently — the backend's
+      // ``/api/setup/install-package`` endpoint serialises pip writes
+      // for us at the OS-FS layer.
+      const existing = installStates[pipPackage];
+      if (existing?.installing) return;
+
+      setInstallStates((prev) => ({
+        ...prev,
+        [pipPackage]: { installing: true, error: null, output: null },
+      }));
+
+      try {
+        const result = await installPipPackage(pipPackage);
+        if (result.ok) {
+          setInstallStates((prev) => ({
+            ...prev,
+            [pipPackage]: {
+              installing: false,
+              error: null,
+              output: result.output ?? null,
+            },
+          }));
+          await probe();
+        } else {
+          setInstallStates((prev) => ({
+            ...prev,
+            [pipPackage]: {
+              installing: false,
+              error: "Install command exited non-zero.",
+              output: result.output ?? null,
+            },
+          }));
+        }
+      } catch (err) {
+        const message = err instanceof Error ? err.message : String(err);
+        setInstallStates((prev) => ({
+          ...prev,
+          [pipPackage]: {
+            installing: false,
+            error: message,
+            output: null,
+          },
+        }));
+      }
+    },
+    [installStates, probe],
+  );
+
+  // Ordered: Apple Silicon, CUDA, cross-platform — but really, all
+  // surfaced together. The user can scan the platformGate column to
+  // decide what their box supports. Keep the catalog order verbatim
+  // so the table is stable across runs.
+  const rows = ACCELERATOR_CATALOG;
+
+  return (
+    <section className="accelerators-boost-pack" style={{ marginTop: 18 }}>
+      <header className="accelerators-boost-pack-header">
+        <h3 style={{ margin: 0, fontSize: "0.98rem", fontWeight: 600 }}>
+          Boost Pack
+        </h3>
+        <p className="muted-text" style={{ margin: "4px 0 0", fontSize: "0.84rem" }}>
+          Optional accelerators for image, video, and chat inference. Each is a
+          single pip install away — click Install on the rows your hardware
+          supports.
+        </p>
+      </header>
+
+      {capError ? (
+        <p
+          className="muted-text"
+          style={{ color: "rgb(252, 165, 165)", margin: "8px 0 0", fontSize: "0.82rem" }}
+        >
+          Could not read accelerator capabilities: {capError}
+        </p>
+      ) : null}
+
+      {!backendOnline ? (
+        <p className="muted-text" style={{ margin: "8px 0 0", fontSize: "0.82rem" }}>
+          Backend offline — start the sidecar to read accelerator state.
+        </p>
+      ) : null}
+
+      <table className="accelerators-boost-pack-table" style={{ width: "100%", marginTop: 10, borderCollapse: "collapse" }}>
+        <thead>
+          <tr style={{ borderBottom: "1px solid rgba(255, 255, 255, 0.06)", fontSize: "0.76rem", color: "var(--muted)" }}>
+            <th style={{ padding: "6px 10px", textAlign: "left", fontWeight: 500 }}>Accelerator</th>
+            <th style={{ padding: "6px 10px", textAlign: "left", fontWeight: 500 }}>Size</th>
+            <th style={{ padding: "6px 10px", textAlign: "left", fontWeight: 500 }}>Platform</th>
+            <th style={{ padding: "6px 10px", textAlign: "right", fontWeight: 500 }}>Status</th>
+            <th style={{ padding: "6px 10px", textAlign: "right", fontWeight: 500 }}>Action</th>
+          </tr>
+        </thead>
+        <tbody>
+          {rows.map((meta) => {
+            const state = installStates[meta.pipPackage] ?? EMPTY_INSTALL_STATE;
+            return (
+              <AcceleratorCard
+                key={meta.id}
+                meta={meta}
+                capabilities={capabilities}
+                variant="row"
+                installing={state.installing}
+                installError={state.error}
+                installOutput={state.output}
+                onInstall={handleInstall}
+                showIncompatible
+              />
+            );
+          })}
+        </tbody>
+      </table>
+    </section>
+  );
+}
diff --git a/src/features/settings/DiagnosticsPanel.tsx b/src/features/settings/DiagnosticsPanel.tsx
index 88d09f3..41da875 100644
--- a/src/features/settings/DiagnosticsPanel.tsx
+++ b/src/features/settings/DiagnosticsPanel.tsx
@@ -12,6 +12,7 @@ import {
   type InstallResult,
   type StorageTopResponse,
 } from "../../api";
+import { AcceleratorsBoostPack } from "./AcceleratorsBoostPack";
 
 // In-app troubleshooting panel. Surfaces OS, hardware, runtime paths,
 // GPU state, env vars, and the backend log tail without asking users to
@@ -413,6 +414,7 @@ export function DiagnosticsPanel({ backendOnline, onRestartServer, busyAction }:
           </div>
         </div>
       ) : null}
+      <AcceleratorsBoostPack backendOnline={backendOnline} />
       <StorageTopSection backendOnline={backendOnline} />
     </Panel>
   );

From 233b4d7e125f8ff4b46e243eb5adfa26680e1cb4 Mon Sep 17 00:00:00 2001
From: Cryptopoly <31970407+cryptopoly@users.noreply.github.com>
Date: Sun, 17 May 2026 11:57:08 +0100
Subject: [PATCH 05/15] feat: Image Studio accelerator surfaces (FU-056 Phase
 3)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Wires accelerator install affordances into the three Image surfaces
users actually look at when picking + running a model:

  1. Image Models tab — every installed FLUX / SD3.5 / Qwen-Image /
     SANA / PixArt row gets read-only pills next to the style tags:
     "🚀 SVDQuant 4-bit" + "🚀 Fast attention DiT" when the
     accelerator is missing, "✓ ..." when present. UNet pipelines
     (SD1.5 / SDXL) show no pills — neither nunchaku nor
     sageattention applies.
  2. Image Discover tab — same pills on catalog variant cards in
     the same position. Lets users see acceleration potential
     before committing to a download.
  3. Image Studio runtime banner — new "Performance boosters"
     section between the torch-upgrade pill and the model-load
     summary. Card variants of the same accelerators with full
     Install / Retry buttons. Self-contained install state: clicks
     POST /api/setup/install-package, capture the response
     capabilities, and overlay them onto the parent-provided
     snapshot so the card flips to "✓ Installed v..." without
     waiting for the next workspace refetch.

The pills on the Models / Discover tabs are deliberately
read-only — the install action lives in Studio's runtime banner so
install state stays concentrated. A new optional onInstall prop on
AcceleratorCard drives this: when omitted, the card renders as
passive info.

New helper getApplicableAccelerators(repo) maps a model repo to the
accelerator IDs that apply. Pattern-matches on the family slug
(FLUX.1, sd3.5, qwen-image, sana, pixart-sigma) so we don't have to
edit catalog/image_models.py to land this — the catalog-side
recommendedAccelerators metadata pattern is reserved for Phase 7
when the i18n + per-variant overrides land together. 7 new unit
tests pin the matrix (FLUX, SD3.5, Qwen-Image, SANA, PixArt for
nunchaku+sageattention; Wan / HunyuanVideo / LTX / CogVideoX /
Mochi for sageattention-only; Wan2.1-T2V-1.3B for the triattention
LongLive bonus; SDXL / SD1.5 return empty).

NativeBackendStatus threads from App.tsx → ImageModelsTab,
ImageDiscoverTab, ImageStudioTab → ImageStudioRuntimeBanner →
ImageStudioBoosters. The prop is optional everywhere so older
backends without FU-056 Phase 1 fields collapse pills to their
"available" state rather than crashing the tab.

Deferred to a follow-up commit: the post-generation suggestion
toast (fires when a non-Nunchaku FLUX gen takes >12s on CUDA). The
discovery + install surfaces in this commit already give users a
clean path to install accelerators contextually; the toast adds a
nudge but the install affordance is reachable without it.
---
 src/App.tsx                                   |   3 +
 src/components/AcceleratorCard.tsx            |  23 ++-
 .../__tests__/acceleratorCatalog.test.ts      |  83 ++++++++++
 src/components/acceleratorCatalog.ts          |  78 +++++++++
 src/features/images/ImageDiscoverTab.tsx      |  27 ++++
 src/features/images/ImageModelsTab.tsx        |  28 ++++
 src/features/images/ImageStudioBoosters.tsx   | 152 ++++++++++++++++++
 .../images/ImageStudioRuntimeBanner.tsx       |  18 +++
 src/features/images/ImageStudioTab.tsx        |   7 +
 9 files changed, 414 insertions(+), 5 deletions(-)
 create mode 100644 src/features/images/ImageStudioBoosters.tsx

diff --git a/src/App.tsx b/src/App.tsx
index c215221..9bb8e5e 100644
--- a/src/App.tsx
+++ b/src/App.tsx
@@ -1228,6 +1228,7 @@ export default function App() {
         activeImageDownloads={imgState.activeImageDownloads}
         selectedImageVariant={imgState.selectedImageVariant}
         fileRevealLabel={fileRevealLabel}
+        nativeBackends={nativeBackends}
         onActiveTabChange={setActiveTab}
         onOpenImageStudio={imgState.openImageStudio}
         onImageDownload={(repo) => void imgState.handleImageDownload(repo)}
@@ -1244,6 +1245,7 @@ export default function App() {
         imageCatalog={imgState.imageCatalog}
         activeImageDownloads={imgState.activeImageDownloads}
         fileRevealLabel={fileRevealLabel}
+        nativeBackends={nativeBackends}
         onActiveTabChange={setActiveTab}
         onOpenImageStudio={imgState.openImageStudio}
         onImageDownload={(repo) => void imgState.handleImageDownload(repo)}
@@ -1272,6 +1274,7 @@ export default function App() {
         imageBusy={imgState.imageBusy}
         imageBusyLabel={imgState.imageBusyLabel}
         backendOnline={backendOnline}
+        nativeBackends={nativeBackends}
         activeImageDownloads={imgState.activeImageDownloads}
         imagePrompt={imgState.imagePrompt}
         onImagePromptChange={imgState.setImagePrompt}
diff --git a/src/components/AcceleratorCard.tsx b/src/components/AcceleratorCard.tsx
index f76e875..ee05839 100644
--- a/src/components/AcceleratorCard.tsx
+++ b/src/components/AcceleratorCard.tsx
@@ -52,8 +52,14 @@ export interface AcceleratorCardProps {
    * compact but failures expose the diagnostic. */
   installOutput?: string | null;
   /** Fired when the user clicks Install / Retry. Parent should call
-   * ``installPipPackage(meta.pipPackage)`` then ``refreshWorkspace()``. */
-  onInstall: (pipPackage: string) => void;
+   * ``installPipPackage(meta.pipPackage)`` then ``refreshWorkspace()``.
+   *
+   * Optional: when omitted, the card renders **read-only** — status
+   * pill + meta only, no action button. Used by discovery surfaces
+   * (the Image Models / Discover tabs) where the install action lives
+   * in a sibling surface (the Image Studio runtime banner) so the
+   * install state stays in one place rather than scattered. */
+  onInstall?: (pipPackage: string) => void;
   /** Optional click handler for the platform-mismatch tooltip — lets
    * the parent surface a "this won't run on your hardware" toast. */
   onPlatformMismatch?: (meta: AcceleratorMeta) => void;
@@ -142,7 +148,14 @@ export function AcceleratorCard(props: AcceleratorCardProps) {
     return null;
   }
 
+  // Read-only mode: when no ``onInstall`` is wired we render the card
+  // as a passive informational element — no Install button, no Retry,
+  // no platform-mismatch toast. The discovery surfaces use this so
+  // they don't accidentally become install dispatchers.
+  const readOnly = onInstall === undefined;
+
   const handleInstall = () => {
+    if (readOnly) return;
     if (!compatible) {
       onPlatformMismatch?.(meta);
       return;
@@ -196,7 +209,7 @@ export function AcceleratorCard(props: AcceleratorCardProps) {
           {installed ? "✓ " : "🚀 "}
           {meta.shortLabel}
         </span>
-        {!installed && (
+        {!installed && !readOnly && (
           <button
             type="button"
             className="accelerator-card-action accelerator-card-action-pill"
@@ -229,7 +242,7 @@ export function AcceleratorCard(props: AcceleratorCardProps) {
         <td className="accelerator-card-row-platform">{platformLabel(meta.platformGate)}</td>
         <td className="accelerator-card-row-status">{statusBadge}</td>
         <td className="accelerator-card-row-action">
-          {actionLabel && (
+          {!readOnly && actionLabel && (
             <button
               type="button"
               className="accelerator-card-action"
@@ -280,7 +293,7 @@ export function AcceleratorCard(props: AcceleratorCardProps) {
         </span>
       </div>
 
-      {actionLabel && (
+      {!readOnly && actionLabel && (
         <div className="accelerator-card-actions">
           <button
             type="button"
diff --git a/src/components/__tests__/acceleratorCatalog.test.ts b/src/components/__tests__/acceleratorCatalog.test.ts
index bdfa5be..ce93271 100644
--- a/src/components/__tests__/acceleratorCatalog.test.ts
+++ b/src/components/__tests__/acceleratorCatalog.test.ts
@@ -5,6 +5,7 @@ import {
   ACCELERATOR_CATALOG,
   type AcceleratorMeta,
   getAccelerator,
+  getApplicableAccelerators,
   isPlatformCompatible,
 } from "../acceleratorCatalog";
 
@@ -67,6 +68,88 @@ describe("ACCELERATOR_CATALOG", () => {
   });
 });
 
+describe("getApplicableAccelerators", () => {
+  it("returns empty for null / empty / unknown repos", () => {
+    expect(getApplicableAccelerators(null)).toEqual([]);
+    expect(getApplicableAccelerators(undefined)).toEqual([]);
+    expect(getApplicableAccelerators("")).toEqual([]);
+    expect(getApplicableAccelerators("some/random-thing")).toEqual([]);
+  });
+
+  it("returns empty for UNet pipelines (SDXL, SD1.5)", () => {
+    expect(getApplicableAccelerators("stabilityai/stable-diffusion-xl-base-1.0")).toEqual([]);
+    expect(getApplicableAccelerators("runwayml/stable-diffusion-v1-5")).toEqual([]);
+    expect(getApplicableAccelerators("stabilityai/sdxl-turbo")).toEqual([]);
+  });
+
+  it("recommends nunchaku + sageattention for FLUX.1", () => {
+    expect(getApplicableAccelerators("black-forest-labs/FLUX.1-dev")).toEqual([
+      "nunchaku",
+      "sageattention",
+    ]);
+    expect(getApplicableAccelerators("black-forest-labs/FLUX.1-schnell")).toEqual([
+      "nunchaku",
+      "sageattention",
+    ]);
+    expect(getApplicableAccelerators("black-forest-labs/FLUX.1-Kontext-dev")).toEqual([
+      "nunchaku",
+      "sageattention",
+    ]);
+  });
+
+  it("recommends nunchaku + sageattention for SD3.5 / Qwen-Image / SANA / PixArt-Σ", () => {
+    expect(getApplicableAccelerators("stabilityai/stable-diffusion-3.5-large")).toEqual([
+      "nunchaku",
+      "sageattention",
+    ]);
+    expect(getApplicableAccelerators("Qwen/Qwen-Image")).toEqual(["nunchaku", "sageattention"]);
+    expect(getApplicableAccelerators("Qwen/Qwen-Image-2512")).toEqual([
+      "nunchaku",
+      "sageattention",
+    ]);
+    expect(getApplicableAccelerators("Efficient-Large-Model/SANA-1024px")).toEqual([
+      "nunchaku",
+      "sageattention",
+    ]);
+    expect(getApplicableAccelerators("PixArt-alpha/PixArt-Sigma-XL-2-1024-MS")).toEqual([
+      "nunchaku",
+      "sageattention",
+    ]);
+  });
+
+  it("recommends sageattention (only) for video DiTs that nunchaku doesn't cover", () => {
+    expect(getApplicableAccelerators("Wan-AI/Wan2.2-T2V-A14B-Diffusers")).toEqual([
+      "sageattention",
+    ]);
+    expect(getApplicableAccelerators("tencent/HunyuanVideo")).toEqual(["sageattention"]);
+    expect(getApplicableAccelerators("Lightricks/LTX-Video")).toEqual(["sageattention"]);
+    expect(getApplicableAccelerators("THUDM/CogVideoX-5b")).toEqual(["sageattention"]);
+    expect(getApplicableAccelerators("genmo/mochi-1-preview")).toEqual(["sageattention"]);
+  });
+
+  it("adds triattention for the specific Wan2.1 1.3B repo LongLive targets", () => {
+    expect(getApplicableAccelerators("Wan-AI/Wan2.1-T2V-1.3B")).toEqual([
+      "sageattention",
+      "triattention",
+    ]);
+    expect(getApplicableAccelerators("Wan-AI/Wan2.1-T2V-1.3B-Diffusers")).toEqual([
+      "sageattention",
+      "triattention",
+    ]);
+    // Other Wan sizes shouldn't surface triattention yet.
+    expect(getApplicableAccelerators("Wan-AI/Wan2.1-T2V-14B-Diffusers")).toEqual([
+      "sageattention",
+    ]);
+  });
+
+  it("is case-insensitive", () => {
+    expect(getApplicableAccelerators("BLACK-FOREST-LABS/flux.1-DEV")).toEqual([
+      "nunchaku",
+      "sageattention",
+    ]);
+  });
+});
+
 describe("isPlatformCompatible", () => {
   const cudaCaps = { mlxAvailable: false } as Pick<
     NativeBackendStatus,
diff --git a/src/components/acceleratorCatalog.ts b/src/components/acceleratorCatalog.ts
index 1e35dde..9341879 100644
--- a/src/components/acceleratorCatalog.ts
+++ b/src/components/acceleratorCatalog.ts
@@ -174,6 +174,84 @@ export function getAccelerator(id: string): AcceleratorMeta | undefined {
   return ACCELERATOR_CATALOG.find((entry) => entry.id === id);
 }
 
+/** Map a model repo to the accelerators that *apply* to it.
+ *
+ * Pattern-match on the repo slug. Catalog-side metadata (the FU-007
+ * route — per-variant ``recommendedAccelerators: AcceleratorId[]``)
+ * is the eventual home for this mapping, but Phase 3 lands the data
+ * here so the surfaces can ship without a catalog migration. The
+ * patterns reflect upstream model-card scope:
+ *
+ *   - **nunchaku** (FU-023 / nunchaku v1.2.1): FLUX.1 (dev / schnell /
+ *     Tools / Kontext / Krea), SD3.5 (large / medium), Qwen-Image
+ *     (+ 2512), Z-Image (+ Turbo), SANA, PixArt-Σ. NOT SDXL or SD1.5
+ *     — those are UNet pipelines and nunchaku is DiT-only.
+ *   - **sageattention** (FU-016): any CUDA DiT image or video
+ *     pipeline. Includes everything nunchaku covers, plus video DiTs
+ *     (Wan, HunyuanVideo, LTX-Video, CogVideoX, Mochi). UNet
+ *     pipelines no-op.
+ *   - **triattention** (FU-003 + FU-002): Wan 2.1 1.3B real-time
+ *     long video via LongLive. Other Wan sizes don't carry the
+ *     LongLive LoRAs yet.
+ *
+ * Returns the accelerator ids in display-priority order — most
+ * impactful first. Empty array = no DiT accelerator applies
+ * (SDXL / SD1.5 / SVD / non-diffusion repos) → render nothing.
+ */
+export function getApplicableAccelerators(repo: string | null | undefined): AcceleratorId[] {
+  if (!repo) return [];
+  const lower = repo.toLowerCase();
+
+  // Image / video DiTs that nunchaku covers (CUDA 4-bit SVDQuant).
+  // Match on case-insensitive substring of the family — works across
+  // the various provider prefixes (black-forest-labs/FLUX.1-dev,
+  // stabilityai/stable-diffusion-3.5-large, Qwen/Qwen-Image).
+  const nunchakuFamilies = [
+    "flux.1",
+    "flux1",          // some HF mirror names drop the dot
+    "stable-diffusion-3.5",
+    "sd3.5",
+    "sd35",
+    "qwen-image",
+    "qwen-image-2512",
+    "z-image",
+    "sana",
+    "pixart-sigma",
+    "pixart-σ",
+  ];
+  const matchesNunchaku = nunchakuFamilies.some((needle) => lower.includes(needle));
+
+  // SageAttention applies to any CUDA DiT — superset of nunchaku +
+  // the video DiTs.
+  const videoFamilies = [
+    "wan2.1",
+    "wan2.2",
+    "wan-2.1",
+    "wan-2.2",
+    "hunyuanvideo",
+    "hunyuan-video",
+    "ltx-video",
+    "ltx-2",
+    "cogvideox",
+    "cogvideo",
+    "mochi",
+  ];
+  const matchesVideo = videoFamilies.some((needle) => lower.includes(needle));
+  const matchesSageAttention = matchesNunchaku || matchesVideo;
+
+  // TriAttention specifically targets Wan 2.1 1.3B for LongLive
+  // real-time long-clip mode; other Wan sizes don't carry the LongLive
+  // LoRAs yet. Keep the match narrow until upstream broadens.
+  const matchesTriAttention = /wan[-.]?2\.1[-_]t2v[-_]1\.3b/i.test(lower);
+
+  const result: AcceleratorId[] = [];
+  if (matchesNunchaku) result.push("nunchaku");
+  if (matchesSageAttention) result.push("sageattention");
+  if (matchesTriAttention) result.push("triattention");
+  return result;
+}
+
+
 /** True when this accelerator's ``platformGate`` is satisfied by the
  * current ``NativeBackendStatus``. The caller can use this to hide
  * irrelevant cards (e.g. dflash-mlx on Windows) or to dim them with an
diff --git a/src/features/images/ImageDiscoverTab.tsx b/src/features/images/ImageDiscoverTab.tsx
index a44be50..b562848 100644
--- a/src/features/images/ImageDiscoverTab.tsx
+++ b/src/features/images/ImageDiscoverTab.tsx
@@ -13,6 +13,7 @@ import type {
   ImageDiscoverTaskFilter,
   ImageDiscoverAccessFilter,
 } from "../../types/image";
+import type { NativeBackendStatus } from "../../types/server";
 import {
   compactModelSizeLabel,
   compactReleaseLabel,
@@ -26,6 +27,11 @@ import {
   imageSecondarySizeLabel,
   isGatedImageAccessError,
 } from "../../utils";
+import { AcceleratorCard } from "../../components/AcceleratorCard";
+import {
+  getAccelerator,
+  getApplicableAccelerators,
+} from "../../components/acceleratorCatalog";
 
 type MediaStatusFilter = "all" | "installed" | "not-installed" | "downloading" | "paused" | "failed" | "incomplete";
 type SortDir = "asc" | "desc";
@@ -45,6 +51,10 @@ export interface ImageDiscoverTabProps {
   activeImageDownloads: Record<string, DownloadStatus>;
   selectedImageVariant: ImageModelVariant | null;
   fileRevealLabel: string;
+  /** FU-056 Phase 3: capability snapshot for the accelerator pills
+   * rendered next to each variant. Optional — pre-ready or older
+   * backends collapse pills to their "available" form. */
+  nativeBackends?: NativeBackendStatus;
   onActiveTabChange: (tab: TabId) => void;
   onOpenImageStudio: (modelId?: string) => void;
   onImageDownload: (repo: string) => void;
@@ -211,6 +221,7 @@ export function ImageDiscoverTab({
   activeImageDownloads,
   selectedImageVariant,
   fileRevealLabel,
+  nativeBackends,
   onActiveTabChange,
   onOpenImageStudio,
   onImageDownload,
@@ -584,6 +595,22 @@ export function ImageDiscoverTab({
                               : tLib("imageDiscover.access.open", { defaultValue: "Open" })}
                           </span>
                         ) : null}
+                        {/* FU-056 Phase 3: read-only accelerator pills.
+                            Click-through to install lives in Image Studio's
+                            runtime banner so install state stays in one
+                            place. */}
+                        {getApplicableAccelerators(variant.repo).map((acceleratorId) => {
+                          const meta = getAccelerator(acceleratorId);
+                          if (!meta) return null;
+                          return (
+                            <AcceleratorCard
+                              key={acceleratorId}
+                              meta={meta}
+                              capabilities={nativeBackends ?? null}
+                              variant="pill"
+                            />
+                          );
+                        })}
                       </div>
                     </div>
                     <span>{variant.provider}</span>
diff --git a/src/features/images/ImageModelsTab.tsx b/src/features/images/ImageModelsTab.tsx
index 62d6183..df1ecf2 100644
--- a/src/features/images/ImageModelsTab.tsx
+++ b/src/features/images/ImageModelsTab.tsx
@@ -8,6 +8,7 @@ import type {
   ImageModelVariant,
   TabId,
 } from "../../types";
+import type { NativeBackendStatus } from "../../types/server";
 import {
   compactModelSizeLabel,
   compactReleaseLabel,
@@ -17,6 +18,11 @@ import {
   imagePrimarySizeLabel,
   imageSecondarySizeLabel,
 } from "../../utils";
+import { AcceleratorCard } from "../../components/AcceleratorCard";
+import {
+  getAccelerator,
+  getApplicableAccelerators,
+} from "../../components/acceleratorCatalog";
 
 type InstalledImageSort = "name" | "provider" | "tasks" | "size" | "ram" | "date" | "status";
 type SortDir = "asc" | "desc";
@@ -27,6 +33,11 @@ export interface ImageModelsTabProps {
   imageCatalog: ImageModelFamily[];
   activeImageDownloads: Record<string, DownloadStatus>;
   fileRevealLabel: string;
+  /** FU-056 Phase 3: optional capability snapshot used to drive the
+   * accelerator pills next to each variant. ``undefined`` (older
+   * backends or pre-ready state) collapses every pill to its
+   * "available" form rather than crashing. */
+  nativeBackends?: NativeBackendStatus;
   onActiveTabChange: (tab: TabId) => void;
   onOpenImageStudio: (modelId?: string) => void;
   onImageDownload: (repo: string) => void;
@@ -130,6 +141,7 @@ export function ImageModelsTab({
   imageCatalog,
   activeImageDownloads,
   fileRevealLabel,
+  nativeBackends,
   onActiveTabChange,
   onOpenImageStudio,
   onImageDownload,
@@ -361,6 +373,22 @@ export function ImageModelsTab({
                               {variant.styleTags.slice(0, 4).map((tag) => (
                                 <span key={tag} className="badge subtle">{tag}</span>
                               ))}
+                              {/* FU-056 Phase 3: applicable-accelerator pills.
+                                  Read-only (no install button) — the install
+                                  action lives in the Image Studio runtime
+                                  banner so install state stays in one place. */}
+                              {getApplicableAccelerators(variant.repo).map((acceleratorId) => {
+                                const meta = getAccelerator(acceleratorId);
+                                if (!meta) return null;
+                                return (
+                                  <AcceleratorCard
+                                    key={acceleratorId}
+                                    meta={meta}
+                                    capabilities={nativeBackends ?? null}
+                                    variant="pill"
+                                  />
+                                );
+                              })}
                             </div>
                           </div>
                           <span>{variant.provider}</span>
diff --git a/src/features/images/ImageStudioBoosters.tsx b/src/features/images/ImageStudioBoosters.tsx
new file mode 100644
index 0000000..b9c348a
--- /dev/null
+++ b/src/features/images/ImageStudioBoosters.tsx
@@ -0,0 +1,152 @@
+import { useCallback, useState } from "react";
+
+import { AcceleratorCard } from "../../components/AcceleratorCard";
+import {
+  type AcceleratorId,
+  getAccelerator,
+  getApplicableAccelerators,
+} from "../../components/acceleratorCatalog";
+import { installPipPackage } from "../../api";
+import type { ImageModelVariant } from "../../types";
+import type { NativeBackendStatus } from "../../types/server";
+
+/**
+ * Image Studio "Performance boosters" section (FU-056 Phase 3 / 3d).
+ *
+ * Lives inside ``ImageStudioRuntimeBanner`` between the torch-upgrade
+ * pill and the model-load summary. Renders the accelerator cards that
+ * apply to the currently-selected variant — typically nunchaku +
+ * sageattention on FLUX / SD3.5 / Qwen-Image, nothing on SDXL / SD1.5.
+ *
+ * Self-contained install state: clicking "Install" calls
+ * ``installPipPackage`` directly, captures the result, and overlays
+ * the install response's ``capabilities`` payload onto the parent-
+ * provided ``nativeBackends`` so the card flips to "Installed v…"
+ * without waiting for the next workspace refetch.
+ *
+ * Renders nothing in two cases:
+ *   - ``imageRuntimeStatus.realGenerationAvailable === false`` —
+ *     the user can't generate anything yet; accelerators are moot.
+ *     (Gated by caller in the runtime banner, not this component.)
+ *   - The selected variant has no applicable accelerators
+ *     (SD1.5 / SDXL / non-DiT). The whole section folds away rather
+ *     than rendering an empty "Performance boosters" header.
+ */
+
+export interface ImageStudioBoostersProps {
+  /** The variant currently chosen in the Studio drop-down. Determines
+   * which accelerators are applicable via ``getApplicableAccelerators``. */
+  selectedVariant: ImageModelVariant | null;
+  /** Parent-provided capability snapshot — usually
+   * ``workspace.runtime.nativeBackends``. ``undefined`` (older
+   * backends) collapses every card to its "available" form. */
+  nativeBackends?: NativeBackendStatus;
+}
+
+interface InstallState {
+  installing: boolean;
+  error: string | null;
+  output: string | null;
+}
+
+const EMPTY_INSTALL_STATE: InstallState = {
+  installing: false,
+  error: null,
+  output: null,
+};
+
+export function ImageStudioBoosters({
+  selectedVariant,
+  nativeBackends,
+}: ImageStudioBoostersProps) {
+  // Hold a local capabilities overlay so a fresh install flips the
+  // card state immediately. The parent's ``nativeBackends`` is the
+  // authoritative source; we just merge install responses on top.
+  const [localCaps, setLocalCaps] = useState<NativeBackendStatus | null>(null);
+  const [installStates, setInstallStates] = useState<Record<string, InstallState>>({});
+
+  const handleInstall = useCallback(async (pipPackage: string) => {
+    const existing = installStates[pipPackage];
+    if (existing?.installing) return;
+
+    setInstallStates((prev) => ({
+      ...prev,
+      [pipPackage]: { installing: true, error: null, output: null },
+    }));
+
+    try {
+      const result = await installPipPackage(pipPackage);
+      if (result.ok) {
+        // ``install-package`` re-probes capabilities server-side and
+        // returns the fresh snapshot; we slot it onto the local
+        // overlay so the card flips without a parent refetch.
+        setLocalCaps((result.capabilities as unknown as NativeBackendStatus) ?? null);
+        setInstallStates((prev) => ({
+          ...prev,
+          [pipPackage]: {
+            installing: false,
+            error: null,
+            output: result.output ?? null,
+          },
+        }));
+      } else {
+        setInstallStates((prev) => ({
+          ...prev,
+          [pipPackage]: {
+            installing: false,
+            error: "Install command exited non-zero.",
+            output: result.output ?? null,
+          },
+        }));
+      }
+    } catch (err) {
+      const message = err instanceof Error ? err.message : String(err);
+      setInstallStates((prev) => ({
+        ...prev,
+        [pipPackage]: { installing: false, error: message, output: null },
+      }));
+    }
+  }, [installStates]);
+
+  const repo = selectedVariant?.repo;
+  const applicable: AcceleratorId[] = getApplicableAccelerators(repo);
+
+  if (applicable.length === 0) return null;
+
+  // Merge parent caps with the local install overlay. The overlay
+  // wins per-field so a freshly-installed accelerator's flag flips
+  // green even before the parent re-fetches the workspace.
+  const mergedCaps: NativeBackendStatus | null = localCaps
+    ? { ...(nativeBackends ?? {}), ...localCaps } as NativeBackendStatus
+    : (nativeBackends ?? null);
+
+  return (
+    <section className="image-studio-boosters">
+      <header className="image-studio-boosters-header">
+        <strong style={{ fontSize: "0.92rem" }}>Performance boosters</strong>
+        <span className="muted-text" style={{ fontSize: "0.78rem", marginLeft: 8 }}>
+          for {selectedVariant?.name ?? "the selected model"}
+        </span>
+      </header>
+      <div className="image-studio-boosters-stack" style={{ display: "flex", flexDirection: "column", gap: 8, marginTop: 8 }}>
+        {applicable.map((acceleratorId) => {
+          const meta = getAccelerator(acceleratorId);
+          if (!meta) return null;
+          const state = installStates[meta.pipPackage] ?? EMPTY_INSTALL_STATE;
+          return (
+            <AcceleratorCard
+              key={acceleratorId}
+              meta={meta}
+              capabilities={mergedCaps}
+              variant="card"
+              installing={state.installing}
+              installError={state.error}
+              installOutput={state.output}
+              onInstall={handleInstall}
+            />
+          );
+        })}
+      </div>
+    </section>
+  );
+}
diff --git a/src/features/images/ImageStudioRuntimeBanner.tsx b/src/features/images/ImageStudioRuntimeBanner.tsx
index 3dcb7da..cda9f59 100644
--- a/src/features/images/ImageStudioRuntimeBanner.tsx
+++ b/src/features/images/ImageStudioRuntimeBanner.tsx
@@ -15,6 +15,8 @@ import type {
   GpuBundleJobState,
 } from "../../api";
 import type { ImageModelVariant, ImageRuntimeStatus } from "../../types";
+import type { NativeBackendStatus } from "../../types/server";
+import { ImageStudioBoosters } from "./ImageStudioBoosters";
 
 
 export interface ImageStudioRuntimeBannerProps {
@@ -37,6 +39,10 @@ export interface ImageStudioRuntimeBannerProps {
   installingImageRuntime: boolean;
   gpuBundleJob: GpuBundleJobState | null;
   onInstallImageRuntime: () => void;
+  /** FU-056 Phase 3: capability snapshot for the "Performance
+   * boosters" sub-section. Optional — collapses to the "available"
+   * card state if the backend hasn't probed yet. */
+  nativeBackends?: NativeBackendStatus;
 }
 
 
@@ -62,6 +68,7 @@ export function ImageStudioRuntimeBanner(props: ImageStudioRuntimeBannerProps) {
     installingImageRuntime,
     gpuBundleJob,
     onInstallImageRuntime,
+    nativeBackends,
   } = props;
 
   return (
@@ -173,6 +180,17 @@ export function ImageStudioRuntimeBanner(props: ImageStudioRuntimeBannerProps) {
           busy={busy}
         />
       ) : null}
+      {/* FU-056 Phase 3: per-model accelerator install affordances.
+        * Renders nothing when no accelerators apply to the variant
+        * (SD1.5 / SDXL / non-DiT) or when real generation isn't
+        * available yet (no point installing FLUX accelerators on a
+        * box that can't even run FLUX). */}
+      {imageRuntimeStatus.realGenerationAvailable ? (
+        <ImageStudioBoosters
+          selectedVariant={selectedImageVariant}
+          nativeBackends={nativeBackends}
+        />
+      ) : null}
       {selectedImageVariant && imageRuntimeStatus.realGenerationAvailable ? (
         <div className="image-runtime-summary">
           <p className="muted-text">
diff --git a/src/features/images/ImageStudioTab.tsx b/src/features/images/ImageStudioTab.tsx
index 934cf74..e688975 100644
--- a/src/features/images/ImageStudioTab.tsx
+++ b/src/features/images/ImageStudioTab.tsx
@@ -17,6 +17,7 @@ import type {
   TabId,
   TauriBackendInfo,
 } from "../../types";
+import type { NativeBackendStatus } from "../../types/server";
 import {
   sizeLabel,
   downloadProgressLabel,
@@ -51,6 +52,10 @@ export interface ImageStudioTabProps {
   imageBusy: boolean;
   imageBusyLabel: string | null;
   backendOnline: boolean;
+  /** FU-056 Phase 3: capability snapshot used by the runtime banner's
+   * Performance boosters sub-section to gate Install / Installed pills
+   * on the accelerator cards. */
+  nativeBackends?: NativeBackendStatus;
   activeImageDownloads: Record<string, DownloadStatus>;
   imagePrompt: string;
   onImagePromptChange: (value: string) => void;
@@ -144,6 +149,7 @@ export function ImageStudioTab({
   imageBusy,
   imageBusyLabel,
   backendOnline,
+  nativeBackends,
   activeImageDownloads,
   imagePrompt,
   onImagePromptChange,
@@ -462,6 +468,7 @@ export function ImageStudioTab({
           installingImageRuntime={installingImageRuntime}
           gpuBundleJob={gpuBundleJob}
           onInstallImageRuntime={() => void handleInstallImageRuntime()}
+          nativeBackends={nativeBackends}
         />
       </Panel>
 

From 05e879a790ff855e3a66904f71ec82f46beb0879 Mon Sep 17 00:00:00 2001
From: Cryptopoly <31970407+cryptopoly@users.noreply.github.com>
Date: Sun, 17 May 2026 16:19:23 +0100
Subject: [PATCH 06/15] feat: Video Studio accelerator surfaces (FU-056 Phase
 4)

Mirrors the Image-side wiring from Phase 3 onto the Video tabs:

  1. Video Models tab - every Wan / HunyuanVideo / LTX / CogVideoX /
     Mochi row gets read-only accelerator pills next to the style
     tags. SageAttention applies to all CUDA video DiTs;
     TriAttention surfaces specifically on Wan 2.1 T2V 1.3B for the
     LongLive real-time long-clip mode.
  2. Video Discover tab - same pills on catalog variant cards in
     the same chip-row position.
  3. Video Studio runtime banner - new "Performance boosters"
     section between the torch-upgrade pill and the LongLive
     install row. Full card variants with working Install / Retry
     buttons + collapsible pip output.

Implementation note: the booster section was identical to the
image-side equivalent (same install state machine, same card
rendering, same overlay-on-install-success pattern). Renamed
ImageStudioBoosters -> MediaStudioBoosters and moved to
src/components/ so both surfaces share one file. The component
now takes a minimal {repo, name?} variant slice rather than a
concrete ImageModelVariant / VideoModelVariant - both shapes
carry those fields and the booster logic doesn't need anything
else. One source of truth for the install / overlay / re-probe
dance.

NativeBackendStatus threads from App.tsx -> VideoDiscoverTab,
VideoModelsTab, VideoStudioTab -> VideoStudioRuntimeBanner ->
MediaStudioBoosters. Prop is optional everywhere so older
backends without FU-056 Phase 1 fields collapse pills to their
"available" state rather than crashing the tab.

No new tests required - the getApplicableAccelerators repo-pattern
matrix is already pinned by Phase 3's 7 tests, including all four
relevant video repos (Wan2.1-T2V-1.3B with triattention bonus,
Wan2.2-T2V-A14B without, HunyuanVideo, LTX-Video, CogVideoX,
Mochi). MediaStudioBoosters internals match the previous
ImageStudioBoosters, no behavioural changes.
---
 src/App.tsx                                   |  3 +
 .../MediaStudioBoosters.tsx}                  | 63 ++++++++++++-------
 .../images/ImageStudioRuntimeBanner.tsx       |  4 +-
 src/features/video/VideoDiscoverTab.tsx       | 25 ++++++++
 src/features/video/VideoModelsTab.tsx         | 28 +++++++++
 .../video/VideoStudioRuntimeBanner.tsx        | 19 ++++++
 src/features/video/VideoStudioTab.tsx         |  8 +++
 7 files changed, 125 insertions(+), 25 deletions(-)
 rename src/{features/images/ImageStudioBoosters.tsx => components/MediaStudioBoosters.tsx} (68%)

diff --git a/src/App.tsx b/src/App.tsx
index 9bb8e5e..213ef4b 100644
--- a/src/App.tsx
+++ b/src/App.tsx
@@ -1384,6 +1384,7 @@ export default function App() {
         activeVideoDownloads={videoState.activeVideoDownloads}
         selectedVideoVariant={videoState.selectedVideoVariant}
         fileRevealLabel={fileRevealLabel}
+        nativeBackends={nativeBackends}
         longLiveStatus={videoState.longLiveStatus}
         installingLongLive={videoState.installingLongLive}
         longLiveJob={videoState.longLiveJob}
@@ -1409,6 +1410,7 @@ export default function App() {
         videoBusyLabel={videoState.videoBusyLabel}
         loadedVideoVariant={videoState.loadedVideoVariant}
         fileRevealLabel={fileRevealLabel}
+        nativeBackends={nativeBackends}
         onActiveTabChange={setActiveTab}
         onOpenVideoStudio={videoState.openVideoStudio}
         onVideoDownload={(repo, modelId) => void videoState.handleVideoDownload(repo, modelId)}
@@ -1434,6 +1436,7 @@ export default function App() {
         loadedVideoVariant={videoState.loadedVideoVariant}
         videoRuntimeStatus={videoState.videoRuntimeStatus}
         tauriBackend={tauriBackend}
+        nativeBackends={nativeBackends}
         busy={busy}
         busyAction={busyAction}
         videoBusy={videoState.videoBusy}
diff --git a/src/features/images/ImageStudioBoosters.tsx b/src/components/MediaStudioBoosters.tsx
similarity index 68%
rename from src/features/images/ImageStudioBoosters.tsx
rename to src/components/MediaStudioBoosters.tsx
index b9c348a..d400351 100644
--- a/src/features/images/ImageStudioBoosters.tsx
+++ b/src/components/MediaStudioBoosters.tsx
@@ -1,22 +1,29 @@
 import { useCallback, useState } from "react";
 
-import { AcceleratorCard } from "../../components/AcceleratorCard";
+import { AcceleratorCard } from "./AcceleratorCard";
 import {
   type AcceleratorId,
   getAccelerator,
   getApplicableAccelerators,
-} from "../../components/acceleratorCatalog";
-import { installPipPackage } from "../../api";
-import type { ImageModelVariant } from "../../types";
-import type { NativeBackendStatus } from "../../types/server";
+} from "./acceleratorCatalog";
+import { installPipPackage } from "../api";
+import type { NativeBackendStatus } from "../types/server";
 
 /**
- * Image Studio "Performance boosters" section (FU-056 Phase 3 / 3d).
+ * Shared "Performance boosters" section (FU-056 Phase 3 + Phase 4).
  *
- * Lives inside ``ImageStudioRuntimeBanner`` between the torch-upgrade
- * pill and the model-load summary. Renders the accelerator cards that
- * apply to the currently-selected variant — typically nunchaku +
- * sageattention on FLUX / SD3.5 / Qwen-Image, nothing on SDXL / SD1.5.
+ * Sits inside ``ImageStudioRuntimeBanner`` and
+ * ``VideoStudioRuntimeBanner`` between the torch-upgrade pill and the
+ * model-load summary. Renders the accelerator cards that apply to the
+ * currently-selected variant — typically nunchaku + sageattention on
+ * FLUX / SD3.5 / Qwen-Image, sageattention on video DiTs, plus
+ * triattention specifically on Wan 2.1 1.3B for the LongLive bonus.
+ *
+ * The component takes a minimal ``{repo, name}`` slice of the variant
+ * rather than a concrete ``ImageModelVariant`` / ``VideoModelVariant``
+ * type — both shapes carry those two fields and the booster logic
+ * doesn't need anything else. Keeps one source of truth for the
+ * install / overlay / re-probe dance.
  *
  * Self-contained install state: clicking "Install" calls
  * ``installPipPackage`` directly, captures the result, and overlays
@@ -25,18 +32,28 @@ import type { NativeBackendStatus } from "../../types/server";
  * without waiting for the next workspace refetch.
  *
  * Renders nothing in two cases:
- *   - ``imageRuntimeStatus.realGenerationAvailable === false`` —
- *     the user can't generate anything yet; accelerators are moot.
- *     (Gated by caller in the runtime banner, not this component.)
- *   - The selected variant has no applicable accelerators
- *     (SD1.5 / SDXL / non-DiT). The whole section folds away rather
- *     than rendering an empty "Performance boosters" header.
+ *   - The selected variant has no applicable accelerators (SD1.5 /
+ *     SDXL / non-DiT). The whole section folds away rather than
+ *     rendering an empty "Performance boosters" header.
+ *   - ``selectedVariant === null`` — same reason.
+ *
+ * Callers should additionally gate the render on
+ * ``runtimeStatus.realGenerationAvailable`` so accelerators don't
+ * surface on a box that can't even run the pipeline yet.
  */
 
-export interface ImageStudioBoostersProps {
+/** Minimal structural shape of a Studio variant. Both
+ * ``ImageModelVariant`` and ``VideoModelVariant`` carry these fields,
+ * so the component accepts either. */
+export interface MediaStudioBoostersVariant {
+  repo: string;
+  name?: string;
+}
+
+export interface MediaStudioBoostersProps {
   /** The variant currently chosen in the Studio drop-down. Determines
    * which accelerators are applicable via ``getApplicableAccelerators``. */
-  selectedVariant: ImageModelVariant | null;
+  selectedVariant: MediaStudioBoostersVariant | null;
   /** Parent-provided capability snapshot — usually
    * ``workspace.runtime.nativeBackends``. ``undefined`` (older
    * backends) collapses every card to its "available" form. */
@@ -55,10 +72,10 @@ const EMPTY_INSTALL_STATE: InstallState = {
   output: null,
 };
 
-export function ImageStudioBoosters({
+export function MediaStudioBoosters({
   selectedVariant,
   nativeBackends,
-}: ImageStudioBoostersProps) {
+}: MediaStudioBoostersProps) {
   // Hold a local capabilities overlay so a fresh install flips the
   // card state immediately. The parent's ``nativeBackends`` is the
   // authoritative source; we just merge install responses on top.
@@ -121,14 +138,14 @@ export function ImageStudioBoosters({
     : (nativeBackends ?? null);
 
   return (
-    <section className="image-studio-boosters">
-      <header className="image-studio-boosters-header">
+    <section className="media-studio-boosters">
+      <header className="media-studio-boosters-header">
         <strong style={{ fontSize: "0.92rem" }}>Performance boosters</strong>
         <span className="muted-text" style={{ fontSize: "0.78rem", marginLeft: 8 }}>
           for {selectedVariant?.name ?? "the selected model"}
         </span>
       </header>
-      <div className="image-studio-boosters-stack" style={{ display: "flex", flexDirection: "column", gap: 8, marginTop: 8 }}>
+      <div className="media-studio-boosters-stack" style={{ display: "flex", flexDirection: "column", gap: 8, marginTop: 8 }}>
         {applicable.map((acceleratorId) => {
           const meta = getAccelerator(acceleratorId);
           if (!meta) return null;
diff --git a/src/features/images/ImageStudioRuntimeBanner.tsx b/src/features/images/ImageStudioRuntimeBanner.tsx
index cda9f59..3646dd6 100644
--- a/src/features/images/ImageStudioRuntimeBanner.tsx
+++ b/src/features/images/ImageStudioRuntimeBanner.tsx
@@ -16,7 +16,7 @@ import type {
 } from "../../api";
 import type { ImageModelVariant, ImageRuntimeStatus } from "../../types";
 import type { NativeBackendStatus } from "../../types/server";
-import { ImageStudioBoosters } from "./ImageStudioBoosters";
+import { MediaStudioBoosters } from "../../components/MediaStudioBoosters";
 
 
 export interface ImageStudioRuntimeBannerProps {
@@ -186,7 +186,7 @@ export function ImageStudioRuntimeBanner(props: ImageStudioRuntimeBannerProps) {
         * available yet (no point installing FLUX accelerators on a
         * box that can't even run FLUX). */}
       {imageRuntimeStatus.realGenerationAvailable ? (
-        <ImageStudioBoosters
+        <MediaStudioBoosters
           selectedVariant={selectedImageVariant}
           nativeBackends={nativeBackends}
         />
diff --git a/src/features/video/VideoDiscoverTab.tsx b/src/features/video/VideoDiscoverTab.tsx
index e579a43..a432143 100644
--- a/src/features/video/VideoDiscoverTab.tsx
+++ b/src/features/video/VideoDiscoverTab.tsx
@@ -12,6 +12,7 @@ import type {
 } from "../../types";
 import type { DiscoverSort } from "../../types/image";
 import type { VideoDiscoverTaskFilter } from "../../types/video";
+import type { NativeBackendStatus } from "../../types/server";
 import {
   compactModelSizeLabel,
   compactReleaseLabel,
@@ -25,6 +26,11 @@ import {
   videoPrimarySizeLabel,
   videoSecondarySizeLabel,
 } from "../../utils";
+import { AcceleratorCard } from "../../components/AcceleratorCard";
+import {
+  getAccelerator,
+  getApplicableAccelerators,
+} from "../../components/acceleratorCatalog";
 
 type MediaStatusFilter = "all" | "installed" | "not-installed" | "downloading" | "paused" | "failed" | "incomplete";
 type SortDir = "asc" | "desc";
@@ -51,6 +57,10 @@ export interface VideoDiscoverTabProps {
   activeVideoDownloads: Record<string, DownloadStatus>;
   selectedVideoVariant: VideoModelVariant | null;
   fileRevealLabel: string;
+  /** FU-056 Phase 4: capability snapshot for the accelerator pills
+   * rendered next to each variant. Optional — older backends collapse
+   * pills to the "available" form. */
+  nativeBackends?: NativeBackendStatus;
   longLiveStatus: VideoRuntimeStatus | null;
   installingLongLive: boolean;
   longLiveJob: LongLiveJobState | null;
@@ -235,6 +245,7 @@ export function VideoDiscoverTab({
   activeVideoDownloads,
   selectedVideoVariant,
   fileRevealLabel,
+  nativeBackends,
   longLiveStatus,
   installingLongLive,
   longLiveJob,
@@ -600,6 +611,20 @@ export function VideoDiscoverTab({
                         {variant.styleTags.slice(0, 4).map((tag) => (
                           <span key={tag} className="badge subtle">{tag}</span>
                         ))}
+                        {/* FU-056 Phase 4: read-only accelerator pills.
+                            Install lives in Video Studio's runtime banner. */}
+                        {getApplicableAccelerators(variant.repo).map((acceleratorId) => {
+                          const meta = getAccelerator(acceleratorId);
+                          if (!meta) return null;
+                          return (
+                            <AcceleratorCard
+                              key={acceleratorId}
+                              meta={meta}
+                              capabilities={nativeBackends ?? null}
+                              variant="pill"
+                            />
+                          );
+                        })}
                       </div>
                     </div>
                     <span>{variant.provider}</span>
diff --git a/src/features/video/VideoModelsTab.tsx b/src/features/video/VideoModelsTab.tsx
index 285f176..a94c39e 100644
--- a/src/features/video/VideoModelsTab.tsx
+++ b/src/features/video/VideoModelsTab.tsx
@@ -9,6 +9,7 @@ import type {
   VideoModelVariant,
   VideoRuntimeStatus,
 } from "../../types";
+import type { NativeBackendStatus } from "../../types/server";
 import {
   compactModelSizeLabel,
   compactReleaseLabel,
@@ -21,6 +22,11 @@ import {
   videoPrimarySizeLabel,
   videoSecondarySizeLabel,
 } from "../../utils";
+import { AcceleratorCard } from "../../components/AcceleratorCard";
+import {
+  getAccelerator,
+  getApplicableAccelerators,
+} from "../../components/acceleratorCatalog";
 
 type InstalledVideoSort = "name" | "provider" | "tasks" | "size" | "ram" | "date" | "status";
 type SortDir = "asc" | "desc";
@@ -35,6 +41,11 @@ export interface VideoModelsTabProps {
   videoBusyLabel: string | null;
   loadedVideoVariant: VideoModelVariant | null;
   fileRevealLabel: string;
+  /** FU-056 Phase 4: capability snapshot for the accelerator pills
+   * rendered next to each variant (sageattention + triattention for
+   * Wan / HunyuanVideo / LTX / CogVideoX / Mochi). Optional — older
+   * backends collapse pills to the "available" state. */
+  nativeBackends?: NativeBackendStatus;
   onActiveTabChange: (tab: TabId) => void;
   onOpenVideoStudio: (modelId?: string) => void;
   onVideoDownload: (repo: string, modelId?: string) => void;
@@ -154,6 +165,7 @@ export function VideoModelsTab({
   videoBusyLabel,
   loadedVideoVariant,
   fileRevealLabel,
+  nativeBackends,
   onActiveTabChange,
   onOpenVideoStudio,
   onVideoDownload,
@@ -396,6 +408,22 @@ export function VideoModelsTab({
                               {variant.styleTags.slice(0, 4).map((tag) => (
                                 <span key={tag} className="badge subtle">{tag}</span>
                               ))}
+                              {/* FU-056 Phase 4: applicable-accelerator pills.
+                                  Read-only — install action lives in the Video
+                                  Studio runtime banner so install state stays
+                                  in one place. */}
+                              {getApplicableAccelerators(variant.repo).map((acceleratorId) => {
+                                const meta = getAccelerator(acceleratorId);
+                                if (!meta) return null;
+                                return (
+                                  <AcceleratorCard
+                                    key={acceleratorId}
+                                    meta={meta}
+                                    capabilities={nativeBackends ?? null}
+                                    variant="pill"
+                                  />
+                                );
+                              })}
                             </div>
                           </div>
                           <span>{variant.provider}</span>
diff --git a/src/features/video/VideoStudioRuntimeBanner.tsx b/src/features/video/VideoStudioRuntimeBanner.tsx
index 614d3f0..232102f 100644
--- a/src/features/video/VideoStudioRuntimeBanner.tsx
+++ b/src/features/video/VideoStudioRuntimeBanner.tsx
@@ -21,6 +21,8 @@ import type {
   LongLiveJobState,
 } from "../../api";
 import type { VideoModelVariant, VideoRuntimeStatus } from "../../types";
+import type { NativeBackendStatus } from "../../types/server";
+import { MediaStudioBoosters } from "../../components/MediaStudioBoosters";
 
 
 export interface VideoStudioRuntimeBannerProps {
@@ -62,6 +64,11 @@ export interface VideoStudioRuntimeBannerProps {
   onInstallOutputDeps: () => void;
   onInstallTokenizerDeps: () => void;
   onInstallGpuRuntime: () => void;
+  /** FU-056 Phase 4: capability snapshot + selected variant for the
+   * "Performance boosters" sub-section. Both optional so older
+   * backends + early-render states collapse cleanly. */
+  selectedVideoVariant?: VideoModelVariant | null;
+  nativeBackends?: NativeBackendStatus;
 }
 
 
@@ -101,6 +108,8 @@ export function VideoStudioRuntimeBanner(props: VideoStudioRuntimeBannerProps) {
     onInstallOutputDeps,
     onInstallTokenizerDeps,
     onInstallGpuRuntime,
+    selectedVideoVariant,
+    nativeBackends,
   } = props;
 
   return (
@@ -272,6 +281,16 @@ export function VideoStudioRuntimeBanner(props: VideoStudioRuntimeBannerProps) {
           busy={busy}
         />
       ) : null}
+      {/* FU-056 Phase 4: per-model accelerator install affordances.
+        * Same shape as Image Studio's boosters section. Renders nothing
+        * when the variant has no applicable accelerators (e.g. SD-class
+        * UNet repos, non-DiT video) — the section folds away. */}
+      {videoRuntimeStatus.realGenerationAvailable ? (
+        <MediaStudioBoosters
+          selectedVariant={selectedVideoVariant ?? null}
+          nativeBackends={nativeBackends}
+        />
+      ) : null}
       {isLongLiveVariant && longLiveStatus && !longLiveStatus.realGenerationAvailable ? (
         <div className="image-runtime-actions">
           <p className="muted-text">
diff --git a/src/features/video/VideoStudioTab.tsx b/src/features/video/VideoStudioTab.tsx
index 2aa88df..957bfc9 100644
--- a/src/features/video/VideoStudioTab.tsx
+++ b/src/features/video/VideoStudioTab.tsx
@@ -13,6 +13,7 @@ import type {
   VideoModelVariant,
   VideoRuntimeStatus,
 } from "../../types";
+import type { NativeBackendStatus } from "../../types/server";
 import {
   IMAGE_CACHE_STRATEGIES,
   VIDEO_CACHE_STRATEGY_DEFAULT_THRESH,
@@ -51,6 +52,10 @@ export interface VideoStudioTabProps {
   loadedVideoVariant: VideoModelVariant | null;
   videoRuntimeStatus: VideoRuntimeStatus;
   tauriBackend: TauriBackendInfo | null;
+  /** FU-056 Phase 4: capability snapshot for the runtime banner's
+   * "Performance boosters" sub-section. Optional — defaults to
+   * undefined when the parent's workspace probe hasn't reported yet. */
+  nativeBackends?: NativeBackendStatus;
   busy: boolean;
   busyAction: string | null;
   videoBusy: boolean;
@@ -161,6 +166,7 @@ export function VideoStudioTab({
   loadedVideoVariant,
   videoRuntimeStatus,
   tauriBackend,
+  nativeBackends,
   busy,
   busyAction,
   videoBusy,
@@ -692,6 +698,8 @@ export function VideoStudioTab({
           onInstallOutputDeps={() => void handleInstallOutputDeps()}
           onInstallTokenizerDeps={() => void handleInstallTokenizerDeps()}
           onInstallGpuRuntime={() => void handleInstallGpuRuntime()}
+          selectedVideoVariant={selectedVideoVariant}
+          nativeBackends={nativeBackends}
         />
 
         <div className="image-studio-grid video-studio-top-grid" style={{ display: "grid", gap: "0.5rem", gridTemplateColumns: "1fr" }}>

From f7415351c8d885d05f0f601b6479cf745d8cc527 Mon Sep 17 00:00:00 2001
From: Cryptopoly <31970407+cryptopoly@users.noreply.github.com>
Date: Sun, 17 May 2026 16:45:05 +0100
Subject: [PATCH 07/15] feat: Chat composer DFlash install nudge (FU-056 Phase
 5)

Brings the in-app accelerator install affordance to the chat
surface. When the user is chatting with a model that has a
registered DFlash draft AND the appropriate pip package isn't
installed yet, an unobtrusive nudge bar appears above the prompt
textarea:

    DFlash speculative decoding can ~2x this model with no quality
    loss.    [Install DFlash]

Click installs the right package for the active backend
(``dflash-mlx`` on Apple Silicon MLX, ``dflash`` on CUDA vLLM)
via the existing ``handleInstallPackage`` dispatcher. The bar
self-hides when the package lands and capabilities re-probe.

Twin gating logic to the AcceleratorCard pattern: the hint only
renders when all three signals line up (model in supportedModels,
package missing for active backend, supported backend). The
backend probe + ``resolveDflashSupport`` helper already exist
from FU-034; this commit wires them into the composer.

Drive-by fix in RuntimeControls.tsx: the existing "Install DFlash"
button next to the launch-settings toggle hard-coded
``onInstallPackage("dflash-mlx")``, which silently installed the
Apple-Silicon package on CUDA / Windows boxes running vLLM. Both
the launch-settings button and the new composer hint now route
through a shared ``dflashPackageFor(backend)`` helper that picks
the right package per backend. 3 new unit tests pin the matrix
(mlx -> dflash-mlx, vllm -> dflash, null / unknown -> dflash-mlx
as safe default).

Net change for the user: discover acceleration potential from
the place where you generate (chat composer / studio runtime
banner / catalog cards), not from a settings page you have to
remember to visit.
---
 src/App.tsx                                   |   5 +
 src/components/RuntimeControls.tsx            |  34 ++++--
 .../__tests__/runtimeSupport.test.ts          |  21 ++++
 src/components/runtimeSupport.ts              |  20 +++
 src/features/chat/ChatComposer.tsx            |  30 +++++
 src/features/chat/ChatComposerDflashHint.tsx  | 115 ++++++++++++++++++
 src/features/chat/ChatTab.tsx                 |  18 +++
 src/styles.css                                |  44 +++++++
 8 files changed, 275 insertions(+), 12 deletions(-)
 create mode 100644 src/features/chat/ChatComposerDflashHint.tsx

diff --git a/src/App.tsx b/src/App.tsx
index 213ef4b..f9e649a 100644
--- a/src/App.tsx
+++ b/src/App.tsx
@@ -1605,6 +1605,11 @@ export default function App() {
         oneTurnOverride={chat.oneTurnOverride}
         onOneTurnOverrideChange={chat.setOneTurnOverride}
         availableCacheStrategies={workspace.system.availableCacheStrategies}
+        dflashInfo={workspace.system.dflash}
+        loadedModelCanonicalRepo={workspace.runtime.loadedModel?.canonicalRepo ?? null}
+        loadedModelName={workspace.runtime.loadedModel?.name ?? null}
+        onInstallPackage={handleInstallPackage}
+        installingPackage={installingPackage}
       />
     );
   } else if (activeTab === "chat-compare") {
diff --git a/src/components/RuntimeControls.tsx b/src/components/RuntimeControls.tsx
index f4948b0..fb778f2 100644
--- a/src/components/RuntimeControls.tsx
+++ b/src/components/RuntimeControls.tsx
@@ -6,6 +6,7 @@ import { InstallLogPanel } from "./InstallLogPanel";
 import { SliderField } from "./SliderField";
 import { PerformancePreview } from "./PerformancePreview";
 import {
+  dflashPackageFor,
   isStrategyCompatible,
   resolveDflashSupport,
   strategyIncompatReason,
@@ -658,18 +659,27 @@ export function RuntimeControls({
             />
             <span>{t("dflash.label", { defaultValue: "DFlash" })}</span>
           </label>
-          {!dflashInstalled && !isGgufBackend && canInstallDflashForModel && onInstallPackage ? (
-            <button
-              type="button"
-              className="cache-strategy-install-btn"
-              disabled={installingPackage != null}
-              onClick={() => onInstallPackage("dflash-mlx")}
-            >
-              {installingPackage === "dflash-mlx"
-                ? t("dflash.installing", { defaultValue: "Installing..." })
-                : t("dflash.installButton", { defaultValue: "Install DFlash" })}
-            </button>
-          ) : null}
+          {!dflashInstalled && !isGgufBackend && canInstallDflashForModel && onInstallPackage ? (() => {
+            // FU-056 Phase 5: pick the right pip package by backend.
+            // MLX backend → ``dflash-mlx`` (Apple Silicon git+url);
+            // vLLM backend → ``dflash`` (PyPI CUDA wheel). Previously
+            // hard-coded to ``dflash-mlx`` which silently installed
+            // the wrong package on Windows / Linux CUDA boxes.
+            const pkg = dflashPackageFor(selectedBackend);
+            const inFlight = installingPackage === pkg;
+            return (
+              <button
+                type="button"
+                className="cache-strategy-install-btn"
+                disabled={installingPackage != null}
+                onClick={() => onInstallPackage(pkg)}
+              >
+                {inFlight
+                  ? t("dflash.installing", { defaultValue: "Installing..." })
+                  : t("dflash.installButton", { defaultValue: "Install DFlash" })}
+              </button>
+            );
+          })() : null}
           <button
             type="button"
             className="cache-strategy-info-btn"
diff --git a/src/components/__tests__/runtimeSupport.test.ts b/src/components/__tests__/runtimeSupport.test.ts
index 454bda5..ac31499 100644
--- a/src/components/__tests__/runtimeSupport.test.ts
+++ b/src/components/__tests__/runtimeSupport.test.ts
@@ -1,12 +1,33 @@
 import { describe, expect, it } from "vitest";
 
 import {
+  dflashPackageFor,
   isStrategyCompatible,
   resolveDflashSupport,
   sanitizeSpeculativeSelection,
   strategyIncompatReason,
 } from "../runtimeSupport";
 
+describe("dflashPackageFor()", () => {
+  it("returns dflash-mlx for the MLX backend", () => {
+    expect(dflashPackageFor("mlx")).toBe("dflash-mlx");
+    expect(dflashPackageFor("MLX")).toBe("dflash-mlx");
+  });
+
+  it("returns dflash for the vLLM CUDA backend", () => {
+    expect(dflashPackageFor("vllm")).toBe("dflash");
+    expect(dflashPackageFor("VLLM")).toBe("dflash");
+  });
+
+  it("defaults to dflash-mlx for null / unknown backends", () => {
+    expect(dflashPackageFor(null)).toBe("dflash-mlx");
+    expect(dflashPackageFor(undefined)).toBe("dflash-mlx");
+    expect(dflashPackageFor("")).toBe("dflash-mlx");
+    expect(dflashPackageFor("auto")).toBe("dflash-mlx");
+    expect(dflashPackageFor("gguf")).toBe("dflash-mlx");
+  });
+});
+
 describe("resolveDflashSupport()", () => {
   const dflashInfo = {
     available: true,
diff --git a/src/components/runtimeSupport.ts b/src/components/runtimeSupport.ts
index 3629638..34e7717 100644
--- a/src/components/runtimeSupport.ts
+++ b/src/components/runtimeSupport.ts
@@ -31,6 +31,26 @@ export function isStrategyCompatible(strategyId: string, backend: string | null
   return supported.some((candidate) => backend.includes(candidate));
 }
 
+/** FU-056 Phase 5: pick the right pip package name for DFlash given
+ * the active backend. Two distinct pip packages back the same feature:
+ *
+ *   - ``dflash-mlx`` — git+url to bstnxbt/dflash-mlx, Apple Silicon
+ *     MLX backend.
+ *   - ``dflash`` — PyPI ``dflash>=0.1.0``, CUDA / vLLM backend.
+ *
+ * The previous RuntimeControls install button hard-coded
+ * ``"dflash-mlx"``, which silently installed the wrong package on
+ * Windows / Linux CUDA boxes running vLLM. This helper picks the
+ * right one based on the engine string. Falls back to the MLX
+ * package for unknown backends — the install will fail loudly if
+ * the host doesn't match, which is better than silent no-ops.
+ */
+export function dflashPackageFor(backend: string | null | undefined): "dflash-mlx" | "dflash" {
+  if (backend && backend.toLowerCase().includes("vllm")) return "dflash";
+  return "dflash-mlx";
+}
+
+
 export function strategyIncompatReason(strategyId: string, backend: string | null | undefined): string | null {
   if (!backend || backend === "auto" || isStrategyCompatible(strategyId, backend)) return null;
   const engineLabel = backend.includes("gguf") || backend.includes("llama") ? "llama.cpp" : backend;
diff --git a/src/features/chat/ChatComposer.tsx b/src/features/chat/ChatComposer.tsx
index baa9ca7..c065d01 100644
--- a/src/features/chat/ChatComposer.tsx
+++ b/src/features/chat/ChatComposer.tsx
@@ -7,6 +7,7 @@ import type { ChatSession, ChatThinkingMode, LaunchPreferences, ModelCapabilitie
 import { MidThreadSwapMenu } from "./MidThreadSwapMenu";
 import type { KvStrategyOverride } from "./kvStrategyOverride";
 import type { SlashCommand } from "./slashCommands";
+import { ChatComposerDflashHint } from "./ChatComposerDflashHint";
 
 /**
  * Phase 2.1: extracted from ChatTab.tsx. The composer area — image
@@ -60,6 +61,17 @@ export interface ChatComposerProps {
   runSlashCommand: (cmd: SlashCommand) => void;
   handleEffortOff: () => void;
   handleEffortChange: (level: ReasoningEffortLevel) => void;
+  // FU-056 Phase 5: optional DFlash install nudge. The composer shows
+  // an inline "Install DFlash" hint when (a) the loaded model has a
+  // registered draft, (b) the package isn't installed yet on the
+  // active backend, and (c) the user is on a backend that supports
+  // it. All three props must be present for the hint to render —
+  // omit any to silently hide the affordance.
+  dflashInfo?: SystemStats["dflash"];
+  loadedModelCanonicalRepo?: string | null;
+  loadedModelName?: string | null;
+  onInstallPackage?: (pipPackage: string) => void;
+  installingPackage?: string | null;
 }
 
 export function ChatComposer({
@@ -99,6 +111,11 @@ export function ChatComposer({
   runSlashCommand,
   handleEffortOff,
   handleEffortChange,
+  dflashInfo,
+  loadedModelCanonicalRepo,
+  loadedModelName,
+  onInstallPackage,
+  installingPackage,
 }: ChatComposerProps) {
   // FU-042: chat surface uses the ``chat`` namespace for prompt /
   // affordance copy, falling back to literal English when a key isn't
@@ -153,6 +170,19 @@ export function ChatComposer({
             ))}
           </div>
         ) : null}
+        {/* FU-056 Phase 5: DFlash install nudge above the textarea.
+            Self-gating — renders nothing when conditions aren't met
+            (no draft for this model, package already installed,
+            unsupported backend, missing dispatcher). */}
+        <ChatComposerDflashHint
+          dflashInfo={dflashInfo}
+          loadedModelEngine={loadedModelEngine}
+          loadedModelRef={loadedModelRef}
+          loadedModelCanonicalRepo={loadedModelCanonicalRepo}
+          loadedModelName={loadedModelName}
+          onInstallPackage={onInstallPackage}
+          installingPackage={installingPackage}
+        />
         <textarea
           className="text-area"
           placeholder={
diff --git a/src/features/chat/ChatComposerDflashHint.tsx b/src/features/chat/ChatComposerDflashHint.tsx
new file mode 100644
index 0000000..602d5dc
--- /dev/null
+++ b/src/features/chat/ChatComposerDflashHint.tsx
@@ -0,0 +1,115 @@
+import { useTranslation } from "react-i18next";
+import {
+  dflashPackageFor,
+  resolveDflashSupport,
+} from "../../components/runtimeSupport";
+import type { SystemStats } from "../../types";
+
+/**
+ * Inline nudge bar above the prompt textarea (FU-056 Phase 5).
+ *
+ * Renders only when:
+ *   - The currently-loaded model has a registered DFlash draft
+ *     (the model ref / canonical repo matches an entry in
+ *     ``dflashInfo.supportedModels``), AND
+ *   - The DFlash pip package isn't installed for the active
+ *     backend yet (i.e. ``dflashInfo.available === false`` for the
+ *     backend-appropriate variant), AND
+ *   - The user is on a backend that actually supports DFlash
+ *     (MLX or vLLM — not GGUF / llama.cpp).
+ *
+ * Click installs the right pip package for the backend
+ * (``dflash-mlx`` on MLX, ``dflash`` on vLLM/CUDA) via the parent's
+ * ``onInstallPackage`` callback. After install the parent
+ * refreshes capabilities, ``dflashInfo.available`` flips ``true``,
+ * and the hint folds away — no manual dismissal needed.
+ *
+ * This is the chat-surface twin of the Image / Video Studio
+ * "Performance boosters" cards: discoverable from the actual
+ * generation surface, no need to drill into Launch settings.
+ */
+
+export interface ChatComposerDflashHintProps {
+  /** Aggregate DFlash signal from ``SystemStats``. Optional — when
+   * the backend probe hasn't reported yet, the hint stays hidden. */
+  dflashInfo?: SystemStats["dflash"];
+  /** Active engine string (``"mlx"`` / ``"vllm"`` / ``"gguf"`` /
+   * ``"llama.cpp"``). Drives both the visibility gate (GGUF hides
+   * the hint entirely — DFlash isn't supported there) and the pip
+   * package picker (vLLM → ``dflash``, MLX → ``dflash-mlx``). */
+  loadedModelEngine?: string | null;
+  /** Currently-loaded model identifiers. Any of the three are
+   * matched against ``dflashInfo.supportedModels`` via the
+   * existing ``resolveDflashSupport`` helper. */
+  loadedModelRef?: string | null;
+  loadedModelCanonicalRepo?: string | null;
+  loadedModelName?: string | null;
+  /** Dispatcher — called with ``"dflash-mlx"`` or ``"dflash"`` per
+   * ``dflashPackageFor(loadedModelEngine)``. Parent owns the install
+   * lifecycle (same pattern as the Studio runtime banners). */
+  onInstallPackage?: (pipPackage: string) => void;
+  /** Which package install is currently in flight, if any.
+   * Drives the "Installing..." button label + disabled state. */
+  installingPackage?: string | null;
+}
+
+export function ChatComposerDflashHint({
+  dflashInfo,
+  loadedModelEngine,
+  loadedModelRef,
+  loadedModelCanonicalRepo,
+  loadedModelName,
+  onInstallPackage,
+  installingPackage,
+}: ChatComposerDflashHintProps) {
+  const { t } = useTranslation("runtime");
+
+  // Bail early on the cheap rejection paths so we don't waste a
+  // resolveDflashSupport call when there's nothing to render.
+  if (!dflashInfo || !onInstallPackage) return null;
+  // Already installed → nothing to nudge about.
+  if (dflashInfo.available) return null;
+
+  const support = resolveDflashSupport({
+    dflashInfo,
+    selectedBackend: loadedModelEngine ?? null,
+    modelRef: loadedModelRef ?? null,
+    canonicalRepo: loadedModelCanonicalRepo ?? null,
+    modelName: loadedModelName ?? null,
+  });
+
+  // ``modelSupported`` is strictly true when the loaded model has
+  // a registered draft. ``null`` means "unknown / empty supported
+  // list" — don't nudge in that case, the user might be on an
+  // unrelated model.
+  if (support.modelSupported !== true) return null;
+
+  const pkg = dflashPackageFor(loadedModelEngine);
+  const inFlight = installingPackage === pkg;
+
+  return (
+    <div
+      className="composer-dflash-hint"
+      role="status"
+      aria-live="polite"
+    >
+      <span className="composer-dflash-hint-icon" aria-hidden="true">⚡</span>
+      <span className="composer-dflash-hint-text">
+        {t("dflash.composerHint", {
+          defaultValue:
+            "DFlash speculative decoding can ~2× this model with no quality loss.",
+        })}
+      </span>
+      <button
+        type="button"
+        className="composer-dflash-hint-button"
+        disabled={installingPackage != null}
+        onClick={() => onInstallPackage(pkg)}
+      >
+        {inFlight
+          ? t("dflash.installing", { defaultValue: "Installing..." })
+          : t("dflash.installButton", { defaultValue: "Install DFlash" })}
+      </button>
+    </div>
+  );
+}
diff --git a/src/features/chat/ChatTab.tsx b/src/features/chat/ChatTab.tsx
index 7b69691..638e3ed 100644
--- a/src/features/chat/ChatTab.tsx
+++ b/src/features/chat/ChatTab.tsx
@@ -113,6 +113,14 @@ export interface ChatTabProps {
   /** Phase 3.2: cache strategies the system advertises so the chip
    * popover lists matching options. */
   availableCacheStrategies: SystemStats["availableCacheStrategies"];
+  /** FU-056 Phase 5: pieces the composer needs to render the
+   * inline "Install DFlash" hint above the textarea. All optional —
+   * the composer hides the affordance when any are missing. */
+  dflashInfo?: SystemStats["dflash"];
+  loadedModelCanonicalRepo?: string | null;
+  loadedModelName?: string | null;
+  onInstallPackage?: (pipPackage: string) => void;
+  installingPackage?: string | null;
 }
 
 // Avoid an unused-import diagnostic — ChatModelOption is still part of
@@ -168,6 +176,11 @@ export function ChatTab({
   oneTurnOverride,
   onOneTurnOverrideChange,
   availableCacheStrategies,
+  dflashInfo,
+  loadedModelCanonicalRepo,
+  loadedModelName,
+  onInstallPackage,
+  installingPackage,
 }: ChatTabProps) {
   const { t } = useTranslation("chat");
   const modelBusyLabel =
@@ -468,6 +481,11 @@ export function ChatTab({
           runSlashCommand={runSlashCommand}
           handleEffortOff={handleEffortOff}
           handleEffortChange={handleEffortChange}
+          dflashInfo={dflashInfo}
+          loadedModelCanonicalRepo={loadedModelCanonicalRepo}
+          loadedModelName={loadedModelName}
+          onInstallPackage={onInstallPackage}
+          installingPackage={installingPackage}
         />
       </Panel>
     </div>
diff --git a/src/styles.css b/src/styles.css
index 10b6647..4abfb1c 100644
--- a/src/styles.css
+++ b/src/styles.css
@@ -4609,6 +4609,50 @@ select.text-input {
   padding: 6px 0;
 }
 
+/* FU-056 Phase 5: inline DFlash install nudge above the prompt
+   textarea. Same visual vocabulary as ``.torch-upgrade-pill`` so the
+   nudge reads as "optional acceleration available" rather than
+   alarming the user. */
+.composer-dflash-hint {
+  display: flex;
+  align-items: center;
+  gap: 8px;
+  padding: 6px 10px;
+  margin-bottom: 6px;
+  background: rgba(80, 140, 220, 0.10);
+  border: 1px solid rgba(80, 140, 220, 0.32);
+  border-radius: 8px;
+  font-size: 0.82rem;
+  color: var(--muted-strong);
+}
+.composer-dflash-hint-icon {
+  font-size: 0.95rem;
+}
+.composer-dflash-hint-text {
+  flex: 1;
+  min-width: 0;
+}
+.composer-dflash-hint-button {
+  appearance: none;
+  border: 1px solid rgba(143, 180, 255, 0.45);
+  background: rgba(80, 140, 220, 0.20);
+  color: var(--accent-strong);
+  border-radius: 6px;
+  padding: 3px 10px;
+  font-size: 0.78rem;
+  font-weight: 500;
+  cursor: pointer;
+  white-space: nowrap;
+  transition: background 0.12s ease;
+}
+.composer-dflash-hint-button:hover:not(:disabled) {
+  background: rgba(80, 140, 220, 0.32);
+}
+.composer-dflash-hint-button:disabled {
+  opacity: 0.55;
+  cursor: not-allowed;
+}
+
 .composer-image-thumb {
   position: relative;
   width: 56px;

From 03a22f462f854e29f3d237963499dd27f4fbb329 Mon Sep 17 00:00:00 2001
From: Cryptopoly <31970407+cryptopoly@users.noreply.github.com>
Date: Sun, 17 May 2026 17:07:41 +0100
Subject: [PATCH 08/15] feat: WSL2 vLLM bridge foundation (FU-056 Phase 8)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

vLLM ships no native Windows wheels; this commit lets Windows users
install vLLM into an isolated WSL venv with one click. Three pieces:

1. **Detector** (backend_service/inference/accelerators.py):
   four new probes layered on top of the existing wsl2_available
   helper:
   - wsl_default_distro() reads "Default Distribution: Ubuntu-X" out
     of the UTF-16 ``wsl --status`` output
   - wsl_cuda_available() runs ``wsl -- nvidia-smi -L`` to confirm
     CUDA passthrough is working inside the distro
   - wsl_vllm_available() runs an ``import vllm`` inside the managed
     venv at ~/.chaosengine/vllm-venv
   - wsl_vllm_version() reads __version__ from the same venv

   Four matching fields on BackendCapabilities (wslDistroName,
   wslCudaAvailable, wslVllmAvailable, wslVllmVersion). The detail
   probes shell out via wsl.exe and can take a few seconds on a
   cold WSL service start, so they're gated behind a wsl2_active
   short-circuit — hosts without WSL pay zero subprocess cost.

2. **Install endpoint** (backend_service/routes/setup/vllm_wsl.py):
   POST /api/setup/install-vllm-wsl + /status. Background-thread job
   with five steps:
   - preflight (verify CUDA visible in WSL)
   - venv (python3 -m venv ~/.chaosengine/vllm-venv)
   - pip-upgrade (pip + setuptools + wheel)
   - pip-vllm (the long one, ~2 GB / 5-15 min)
   - verify (import vllm)

   Same single-job semantics as install-longlive: a second POST
   while running returns the running job state. The venv is rooted
   in the WSL user's $HOME (ext4-backed) so CUDA torch wheels don't
   pay the ~10x IO penalty of being on /mnt/c/.

3. **WslBridgePanel** (src/features/settings/WslBridgePanel.tsx):
   Windows-only Setup panel rendered alongside the Boost Pack on
   the Diagnostics tab. Four bucket states:
   - WSL2 not installed → ``wsl --install`` copy-paste hint + MS docs
   - WSL2 ready, no CUDA → NVIDIA WSL driver kicker link
   - WSL2 + CUDA ready, vLLM missing → one-click install button
   - vLLM ready → green pill with version + "Reinstall" affordance

   Self-probes capabilities on mount, polls install status at 1.5 Hz
   while a job is in flight, refreshes capabilities on completion so
   the bucket flips without a parent refetch. Uses the existing
   InstallLogPanel for log tail (extended to accept the new
   "vllm-wsl" variant).

Tests: 12 new probe tests covering the present / absent / cold-host
matrix for each WSL detail probe, plus 4 endpoint tests pinning the
job-state shape + the Windows platform gate + the start/status
contract. Live-verified on Windows + RTX 4090: detector returns
``distro=Ubuntu-24.04, cuda=True, vllm=False, version=None`` —
correct for the dev box right now.

Deferred to a follow-up commit: the actual engine routing so a
vLLM model load transparently launches inside the WSL venv. This
commit ships only the install path so users can stand up the venv
today; the engine wiring needs careful path translation
(/mnt/c/Users/... → Windows paths) and stdout streaming that
deserves its own focused PR.
---
 backend_service/inference/accelerators.py  | 149 +++++++++
 backend_service/inference/base.py          |  14 +
 backend_service/inference/capabilities.py  |  27 +-
 backend_service/routes/setup/__init__.py   |   2 +
 backend_service/routes/setup/vllm_wsl.py   | 333 +++++++++++++++++++++
 src/api/index.ts                           |   4 +
 src/api/setup.ts                           |  46 +++
 src/components/InstallLogPanel.tsx         |  26 +-
 src/features/settings/DiagnosticsPanel.tsx |   2 +
 src/features/settings/WslBridgePanel.tsx   | 303 +++++++++++++++++++
 src/types/server.ts                        |   9 +
 tests/test_accelerator_capabilities.py     |  95 ++++++
 tests/test_vllm_wsl_install.py             | 128 ++++++++
 13 files changed, 1127 insertions(+), 11 deletions(-)
 create mode 100644 backend_service/routes/setup/vllm_wsl.py
 create mode 100644 src/features/settings/WslBridgePanel.tsx
 create mode 100644 tests/test_vllm_wsl_install.py

diff --git a/backend_service/inference/accelerators.py b/backend_service/inference/accelerators.py
index 5b73f23..0a7684d 100644
--- a/backend_service/inference/accelerators.py
+++ b/backend_service/inference/accelerators.py
@@ -199,3 +199,152 @@ def wsl2_available() -> bool:
     except (FileNotFoundError, subprocess.TimeoutExpired, OSError):
         return False
     return result.returncode == 0
+
+
+# ---------------------------------------------------------------------------
+# WSL2 vLLM bridge probes (FU-056 Phase 8)
+#
+# vLLM ships no native Windows wheels; the practical path on a Windows +
+# CUDA box is to install vLLM inside a WSL2 Ubuntu distro and run it
+# there. These three probes feed the Setup tab's WSL bridge panel +
+# the future engine-routing layer:
+#
+#   - ``wsl_default_distro()`` → string name reported by ``wsl --status``
+#     ("Ubuntu-24.04" on the dev box). The install + run paths anchor
+#     on this so a user with multiple distros gets predictable
+#     behaviour (always use the default).
+#   - ``wsl_cuda_available()`` → ``nvidia-smi -L`` returns exit 0 from
+#     inside WSL, proving CUDA passthrough works. False on stock WSL
+#     installs without the NVIDIA WSL driver kicker.
+#   - ``wsl_vllm_available()`` / ``wsl_vllm_version()`` → the isolated
+#     venv at ``~/.chaosengine/vllm-venv`` can ``import vllm``.
+#
+# Every probe is gated on ``sys.platform == "win32"`` so macOS / Linux
+# hosts pay zero subprocess cost for these checks (the WSL bridge has
+# no meaning there). The same fallthrough pattern as ``wsl2_available``.
+# ---------------------------------------------------------------------------
+
+# Timeout sized for cold WSL service start. The first call after a
+# Windows reboot can take 3-5 s while LxssManager spins up. After
+# that, subsequent calls return in <100 ms.
+_WSL_PROBE_TIMEOUT_SEC = 5.0
+
+# Persistent isolated venv inside WSL. The path is rooted at the WSL
+# user's $HOME — ``wsl`` resolves the leading ``~`` per-distro. This
+# keeps it out of the Windows filesystem (where CUDA torch on
+# ``/mnt/c/...`` would be 10x slower than on the ext4-backed home).
+_WSL_VLLM_VENV_PATH = "~/.chaosengine/vllm-venv"
+
+
+def _run_wsl(args: list[str], timeout: float = _WSL_PROBE_TIMEOUT_SEC) -> subprocess.CompletedProcess[bytes] | None:
+    """Helper: invoke ``wsl <args>`` with a tight timeout, swallow failures.
+
+    Returns ``None`` when the subprocess can't even start (missing
+    ``wsl.exe``, host isn't Windows, etc.) so callers can branch on
+    ``is None`` rather than re-handling FileNotFoundError everywhere.
+    """
+    if sys.platform != "win32":
+        return None
+    try:
+        return subprocess.run(
+            ["wsl", *args],
+            capture_output=True,
+            timeout=timeout,
+            check=False,
+        )
+    except (FileNotFoundError, subprocess.TimeoutExpired, OSError):
+        return None
+
+
+def wsl_default_distro() -> str | None:
+    """Return the WSL default-distro name from ``wsl --status``.
+
+    The line we want looks like ``Default Distribution: Ubuntu-24.04``.
+    Windows emits ``wsl --status`` output as UTF-16 LE with a BOM
+    (a Windows-style legacy), so we decode permissively + filter NUL
+    bytes that survive a flawed decoding.
+    """
+    result = _run_wsl(["--status"])
+    if result is None or result.returncode != 0:
+        return None
+    # ``wsl --status`` output is UTF-16 LE. ``decode("utf-16", errors="ignore")``
+    # handles the BOM cleanly; ``replace("\x00", "")`` is a belt-and-braces
+    # guard for hosts that emit raw UTF-16 without a marker.
+    try:
+        text = result.stdout.decode("utf-16", errors="ignore").replace("\x00", "")
+    except UnicodeDecodeError:
+        text = result.stdout.decode("utf-8", errors="ignore")
+    for line in text.splitlines():
+        normalized = line.strip()
+        if normalized.lower().startswith("default distribution"):
+            # "Default Distribution: Ubuntu-24.04" → "Ubuntu-24.04"
+            _, _, value = normalized.partition(":")
+            distro = value.strip()
+            return distro or None
+    return None
+
+
+def wsl_cuda_available() -> bool:
+    """True when CUDA passthrough into WSL is functional.
+
+    ``nvidia-smi -L`` lists installed GPUs and exits 0 when the NVIDIA
+    WSL driver kicker is present. Without that kicker the binary
+    typically isn't even reachable inside the distro, so this also
+    catches the "user installed WSL but skipped the NVIDIA driver" case.
+    """
+    result = _run_wsl(["--", "nvidia-smi", "-L"])
+    if result is None or result.returncode != 0:
+        return False
+    # A successful nvidia-smi line looks like ``GPU 0: NVIDIA GeForce RTX 4090``.
+    # The ``-L`` output is ASCII so we don't need the UTF-16 dance.
+    return b"GPU " in (result.stdout or b"")
+
+
+def wsl_vllm_available() -> bool:
+    """True when the WSL isolated venv has ``vllm`` importable.
+
+    Runs the import inside the venv's python so we don't accidentally
+    pick up a system-Python install of vllm — only the venv we
+    manage. Same hygiene as the MTPLX detector that checks the
+    dedicated ``~/.chaosengine/mtplx-venv`` path.
+    """
+    result = _run_wsl(
+        [
+            "--",
+            "bash",
+            "-c",
+            (
+                f"test -x {_WSL_VLLM_VENV_PATH}/bin/python && "
+                f"{_WSL_VLLM_VENV_PATH}/bin/python -c 'import vllm' 2>/dev/null"
+            ),
+        ],
+        timeout=8.0,
+    )
+    return result is not None and result.returncode == 0
+
+
+def wsl_vllm_version() -> str | None:
+    """Read ``vllm.__version__`` from the WSL isolated venv, or ``None``.
+
+    Two-shot: skips the import probe if ``wsl_vllm_available()`` already
+    returned False so we don't pay for a duplicate WSL roundtrip on
+    machines where the venv isn't there.
+    """
+    if not wsl_vllm_available():
+        return None
+    result = _run_wsl(
+        [
+            "--",
+            "bash",
+            "-c",
+            (
+                f"{_WSL_VLLM_VENV_PATH}/bin/python -c "
+                "'import vllm; print(getattr(vllm, \"__version__\", \"\"))'"
+            ),
+        ],
+        timeout=8.0,
+    )
+    if result is None or result.returncode != 0:
+        return None
+    version = result.stdout.decode("utf-8", errors="ignore").strip()
+    return version or None
diff --git a/backend_service/inference/base.py b/backend_service/inference/base.py
index 47119ac..0c20194 100644
--- a/backend_service/inference/base.py
+++ b/backend_service/inference/base.py
@@ -117,6 +117,16 @@ class BackendCapabilities:
     kvpressAvailable: bool = False
     kvpressVersion: str | None = None
     wsl2Available: bool = False
+    # FU-056 Phase 8: WSL2 vLLM bridge — detail probes used by the Setup
+    # tab's WSL panel and the future engine-routing layer. ``wslDistroName``
+    # is the default-distro name from ``wsl --status`` (e.g. "Ubuntu-24.04"),
+    # ``wslCudaAvailable`` is true iff ``nvidia-smi -L`` works inside WSL,
+    # ``wslVllmAvailable`` is true iff the managed venv at
+    # ``~/.chaosengine/vllm-venv`` can import vllm.
+    wslDistroName: str | None = None
+    wslCudaAvailable: bool = False
+    wslVllmAvailable: bool = False
+    wslVllmVersion: str | None = None
     probing: bool = False
 
     def to_dict(self) -> dict[str, Any]:
@@ -151,6 +161,10 @@ def to_dict(self) -> dict[str, Any]:
             "kvpressAvailable": self.kvpressAvailable,
             "kvpressVersion": self.kvpressVersion,
             "wsl2Available": self.wsl2Available,
+            "wslDistroName": self.wslDistroName,
+            "wslCudaAvailable": self.wslCudaAvailable,
+            "wslVllmAvailable": self.wslVllmAvailable,
+            "wslVllmVersion": self.wslVllmVersion,
             "probing": self.probing,
         }
 
diff --git a/backend_service/inference/capabilities.py b/backend_service/inference/capabilities.py
index ca82854..0a0a125 100644
--- a/backend_service/inference/capabilities.py
+++ b/backend_service/inference/capabilities.py
@@ -30,6 +30,10 @@
     triattention_available,
     triattention_version,
     wsl2_available,
+    wsl_cuda_available,
+    wsl_default_distro,
+    wsl_vllm_available,
+    wsl_vllm_version,
 )
 from backend_service.inference.base import BackendCapabilities
 from backend_service.inference.binaries import (
@@ -106,6 +110,10 @@ def _initial_backend_capabilities() -> BackendCapabilities:
         kvpressAvailable=kvpress_available(),
         kvpressVersion=kvpress_version(),
         wsl2Available=wsl2_available(),
+        # FU-056 Phase 8: WSL-detail probes deferred to the full probe
+        # below. They shell out to ``wsl --`` subprocesses which can
+        # take 5-8 s each on a cold service start — too slow for the
+        # placeholder path that primes the first UI render.
         probing=True,
     )
 
@@ -147,6 +155,18 @@ def _probe_native_backends() -> BackendCapabilities:
             or (llama_server_turbo_path and _llama_server_supports(llama_server_turbo_path, "--spec-type"))
         )
 
+    # FU-056 Phase 8: WSL2 + vLLM-bridge probes. ``wsl2_available`` is
+    # cheap (``wsl --status`` returns in <100ms on warm LxssManager);
+    # the three detail probes shell out via ``wsl --`` and can take a
+    # few seconds on a cold service start, so they're gated behind the
+    # ``wsl2_active`` short-circuit to avoid paying that cost on hosts
+    # that have no WSL at all.
+    wsl2_active = wsl2_available()
+    wsl_distro = wsl_default_distro() if wsl2_active else None
+    wsl_cuda = wsl_cuda_available() if wsl2_active else False
+    wsl_vllm = wsl_vllm_available() if wsl2_active else False
+    wsl_vllm_ver = wsl_vllm_version() if wsl2_active and wsl_vllm else None
+
     return BackendCapabilities(
         pythonExecutable=python_executable,
         mlxAvailable=mlx_available,
@@ -178,7 +198,12 @@ def _probe_native_backends() -> BackendCapabilities:
         triattentionVersion=triattention_version(),
         kvpressAvailable=kvpress_available(),
         kvpressVersion=kvpress_version(),
-        wsl2Available=wsl2_available(),
+        # FU-056 Phase 8 WSL bridge state (see note above).
+        wsl2Available=wsl2_active,
+        wslDistroName=wsl_distro,
+        wslCudaAvailable=wsl_cuda,
+        wslVllmAvailable=wsl_vllm,
+        wslVllmVersion=wsl_vllm_ver,
     )
 
 
diff --git a/backend_service/routes/setup/__init__.py b/backend_service/routes/setup/__init__.py
index c8e21ae..22ccd06 100644
--- a/backend_service/routes/setup/__init__.py
+++ b/backend_service/routes/setup/__init__.py
@@ -350,6 +350,7 @@ def refresh_capabilities_endpoint(request: Request) -> dict[str, Any]:
 from backend_service.routes.setup.mtplx import router as _mtplx_router
 from backend_service.routes.setup.torch_upgrade import router as _torch_upgrade_router
 from backend_service.routes.setup.turbo import router as _turbo_router
+from backend_service.routes.setup.vllm_wsl import router as _vllm_wsl_router
 from backend_service.routes.setup.wan_install import router as _wan_install_router
 
 router.include_router(_cuda_torch_router)
@@ -359,4 +360,5 @@ def refresh_capabilities_endpoint(request: Request) -> dict[str, Any]:
 router.include_router(_mtplx_router)
 router.include_router(_torch_upgrade_router)
 router.include_router(_turbo_router)
+router.include_router(_vllm_wsl_router)
 router.include_router(_wan_install_router)
diff --git a/backend_service/routes/setup/vllm_wsl.py b/backend_service/routes/setup/vllm_wsl.py
new file mode 100644
index 0000000..127c09f
--- /dev/null
+++ b/backend_service/routes/setup/vllm_wsl.py
@@ -0,0 +1,333 @@
+"""vLLM-in-WSL installer endpoint (FU-056 Phase 8).
+
+vLLM ships no native Windows wheels; the practical path on a Windows
++ CUDA box is to install vLLM inside a WSL2 Ubuntu distro and run it
+there. This module provides the in-app installer + status poll so
+users never have to drop to PowerShell to type
+``wsl -- pip install vllm``.
+
+The install runs three steps inside the user's default WSL distro:
+
+  1. **venv** — ``python3 -m venv ~/.chaosengine/vllm-venv`` (idempotent;
+     skips when already present). The venv is rooted in the WSL user's
+     ``$HOME`` (ext4-backed) so CUDA torch wheels don't pay the
+     ~10× IO penalty of being on ``/mnt/c/...``.
+  2. **pip upgrade** — ``pip install --upgrade pip setuptools wheel``.
+     Stops pip falling back to ancient resolver shapes on Ubuntu 22.04.
+  3. **vllm** — ``pip install vllm``. Pulls torch CUDA + flash-attn +
+     friends. ~2 GB download, ~5-15 min wall time on a warm box.
+  4. **verify** — ``python -c "import vllm"`` confirms the install is
+     functional (catches half-baked builds the way Phase 1's
+     ``_safe_version`` does for the embedded runtime).
+
+Same single-job semantics as the LongLive installer: a second POST
+while running returns the running job state; completion state sticks
+around for a late status poll. Mirrors that module's structure on
+purpose so the frontend's ``InstallLogPanel`` can render WSL-vLLM
+attempts using the same job shape.
+"""
+
+from __future__ import annotations
+
+import subprocess
+import sys
+import threading
+import time
+from dataclasses import dataclass, field
+from typing import Any
+
+from fastapi import APIRouter, HTTPException, Request
+
+from backend_service.i18n import localized_detail
+
+router = APIRouter()
+
+
+_WSL_VLLM_VENV_PATH = "~/.chaosengine/vllm-venv"
+
+# Order matches the user-visible progress: preflight is the first
+# attempt row that surfaces "checking WSL", venv writes the dir,
+# pip-upgrade refreshes packaging plumbing, pip-vllm is the long
+# download, verify proves import works.
+_INSTALL_PHASES: tuple[str, ...] = (
+    "preflight",
+    "venv",
+    "pip-upgrade",
+    "pip-vllm",
+    "verify",
+)
+
+_PHASE_LABELS: dict[str, str] = {
+    "preflight": "Check WSL + CUDA",
+    "venv": "Create isolated venv",
+    "pip-upgrade": "Upgrade pip / setuptools / wheel",
+    "pip-vllm": "Install vllm (~2 GB)",
+    "verify": "Verify import",
+}
+
+# Total wall-time budget per step. The pip-vllm step gets the lion's
+# share — fresh CUDA torch wheel can be 20+ min on a slow link.
+_STEP_TIMEOUTS_SEC: dict[str, float] = {
+    "preflight": 10.0,
+    "venv": 60.0,
+    "pip-upgrade": 180.0,
+    "pip-vllm": 1800.0,
+    "verify": 30.0,
+}
+
+
+@dataclass
+class _VllmWslJobState:
+    id: str = ""
+    phase: str = "idle"  # idle | preflight | installing | done | error
+    message: str = ""
+    package_current: str | None = None
+    package_index: int = 0
+    package_total: int = len(_INSTALL_PHASES)
+    percent: float = 0.0
+    target_dir: str | None = None
+    error: str | None = None
+    started_at: float = 0.0
+    finished_at: float = 0.0
+    attempts: list[dict[str, Any]] = field(default_factory=list)
+    done: bool = False
+
+    def to_dict(self) -> dict[str, Any]:
+        return {
+            "id": self.id,
+            "phase": self.phase,
+            "message": self.message,
+            "packageCurrent": self.package_current,
+            "packageIndex": self.package_index,
+            "packageTotal": self.package_total,
+            "percent": round(self.percent, 1),
+            "targetDir": self.target_dir,
+            "error": self.error,
+            "startedAt": self.started_at,
+            "finishedAt": self.finished_at,
+            "attempts": self.attempts,
+            "done": self.done,
+        }
+
+
+_JOB = _VllmWslJobState()
+_LOCK = threading.Lock()
+
+
+def _run_wsl_step(
+    bash_command: str,
+    timeout_sec: float,
+) -> tuple[int, str]:
+    """Run ``wsl -- bash -c "<command>"`` and return ``(exit_code, output)``.
+
+    Captures stdout + stderr into a single string truncated to ~8000
+    characters — keeps the response payload bounded. ``wsl`` itself
+    emits UTF-16 on some paths but ``bash -c`` output comes back as
+    UTF-8, so we decode permissively to avoid a corrupt-locale crash.
+    """
+    if sys.platform != "win32":
+        return 127, "WSL bridge install only runs on Windows hosts."
+    try:
+        result = subprocess.run(
+            ["wsl", "--", "bash", "-c", bash_command],
+            capture_output=True,
+            timeout=timeout_sec,
+            check=False,
+        )
+    except FileNotFoundError:
+        return 127, "wsl.exe not found on PATH."
+    except subprocess.TimeoutExpired:
+        return 124, f"Step timed out after {timeout_sec:.0f}s."
+    output = (result.stdout + result.stderr).decode("utf-8", errors="ignore")
+    return result.returncode, output[-8000:]
+
+
+def _push_attempt(job: _VllmWslJobState, phase: str, ok: bool, output: str) -> None:
+    job.attempts.append({
+        "phase": phase,
+        "package": _PHASE_LABELS.get(phase, phase),
+        "ok": ok,
+        "output": output,
+    })
+
+
+def _advance(job: _VllmWslJobState, next_phase_index: int) -> None:
+    job.package_index = next_phase_index
+    job.percent = (next_phase_index / job.package_total) * 100.0
+    if next_phase_index < job.package_total:
+        next_phase = _INSTALL_PHASES[next_phase_index]
+        job.package_current = _PHASE_LABELS.get(next_phase, next_phase)
+        job.message = f"Running: {job.package_current}"
+
+
+def _job_worker() -> None:
+    """Run the install steps sequentially, streaming each into ``job.attempts``.
+
+    Any subprocess returning non-zero flips the job to ``error`` and
+    stops the chain. Late status polls see the failing attempt's
+    captured output so the UI can surface the pip error without a
+    separate log fetch.
+    """
+    job = _JOB
+    job.phase = "installing"
+    job.package_current = _PHASE_LABELS["preflight"]
+    job.target_dir = _WSL_VLLM_VENV_PATH
+
+    # Step 1 — preflight. Confirm WSL responds + CUDA passthrough works
+    # before paying for the venv + pip download. Fails fast if the user
+    # tried to install on a box where ``nvidia-smi -L`` doesn't work
+    # inside WSL (the NVIDIA WSL driver kicker hasn't been installed
+    # on the Windows host).
+    code, output = _run_wsl_step(
+        "nvidia-smi -L",
+        _STEP_TIMEOUTS_SEC["preflight"],
+    )
+    _push_attempt(job, "preflight", ok=(code == 0), output=output)
+    if code != 0:
+        job.phase = "error"
+        job.error = (
+            "CUDA isn't reachable inside WSL. Install the NVIDIA WSL "
+            "driver on Windows first: https://docs.nvidia.com/cuda/wsl-user-guide/"
+        )
+        job.message = job.error
+        job.finished_at = time.time()
+        job.done = True
+        return
+    _advance(job, 1)
+
+    # Step 2 — venv. ``python3 -m venv`` is idempotent: if the dir
+    # already exists Python silently re-creates the pyvenv.cfg shim
+    # without nuking site-packages. We still wrap in ``mkdir -p`` so
+    # the parent ``~/.chaosengine`` exists on a clean WSL host.
+    code, output = _run_wsl_step(
+        (
+            f"mkdir -p $HOME/.chaosengine && "
+            f"python3 -m venv {_WSL_VLLM_VENV_PATH}"
+        ),
+        _STEP_TIMEOUTS_SEC["venv"],
+    )
+    _push_attempt(job, "venv", ok=(code == 0), output=output)
+    if code != 0:
+        job.phase = "error"
+        job.error = "Failed to create the WSL venv. See output above."
+        job.message = job.error
+        job.finished_at = time.time()
+        job.done = True
+        return
+    _advance(job, 2)
+
+    # Step 3 — pip upgrade. Ubuntu 22.04 ships pip 22.x; the vllm
+    # wheel resolution wants pip ≥ 23.0 to pick the right CUDA tag.
+    code, output = _run_wsl_step(
+        (
+            f"{_WSL_VLLM_VENV_PATH}/bin/python -m pip install "
+            "--upgrade pip setuptools wheel"
+        ),
+        _STEP_TIMEOUTS_SEC["pip-upgrade"],
+    )
+    _push_attempt(job, "pip-upgrade", ok=(code == 0), output=output)
+    if code != 0:
+        job.phase = "error"
+        job.error = "Failed to upgrade pip in the WSL venv."
+        job.message = job.error
+        job.finished_at = time.time()
+        job.done = True
+        return
+    _advance(job, 3)
+
+    # Step 4 — the actual pip install. Long step (~2 GB download +
+    # extraction). The InstallLogPanel will show pip's progress lines
+    # in the attempt row as they accumulate.
+    code, output = _run_wsl_step(
+        f"{_WSL_VLLM_VENV_PATH}/bin/pip install vllm",
+        _STEP_TIMEOUTS_SEC["pip-vllm"],
+    )
+    _push_attempt(job, "pip-vllm", ok=(code == 0), output=output)
+    if code != 0:
+        job.phase = "error"
+        job.error = "pip install vllm failed. See output above."
+        job.message = job.error
+        job.finished_at = time.time()
+        job.done = True
+        return
+    _advance(job, 4)
+
+    # Step 5 — verify the install is functional. Catches the
+    # half-baked-install failure mode we hit with torch on Windows
+    # (DLLs present but Python source missing).
+    code, output = _run_wsl_step(
+        (
+            f"{_WSL_VLLM_VENV_PATH}/bin/python -c "
+            "'import vllm; print(vllm.__version__)'"
+        ),
+        _STEP_TIMEOUTS_SEC["verify"],
+    )
+    _push_attempt(job, "verify", ok=(code == 0), output=output)
+    if code != 0:
+        job.phase = "error"
+        job.error = "vllm installed but ``import vllm`` failed inside the WSL venv."
+        job.message = job.error
+        job.finished_at = time.time()
+        job.done = True
+        return
+
+    job.phase = "done"
+    job.percent = 100.0
+    job.message = f"vLLM ready in WSL ({output.strip() or 'version unknown'})."
+    job.finished_at = time.time()
+    job.done = True
+
+
+@router.post("/api/setup/install-vllm-wsl")
+def start_install_vllm_wsl(request: Request) -> dict[str, Any]:
+    """Kick off the WSL vLLM install on a background thread.
+
+    Idempotent: a second POST while a job is in flight returns the
+    running state instead of double-booting the install. Same shape
+    as ``install-gpu-bundle`` so the frontend pattern stays uniform.
+    """
+    if sys.platform != "win32":
+        raise HTTPException(
+            status_code=400,
+            detail=localized_detail(
+                request,
+                "vLLM-in-WSL install only runs on Windows hosts.",
+            ),
+        )
+
+    with _LOCK:
+        if _JOB.phase in {"preflight", "installing"}:
+            return _JOB.to_dict()
+
+        _JOB.id = f"vllm-wsl-{int(time.time() * 1000)}"
+        _JOB.phase = "preflight"
+        _JOB.message = "Starting vLLM install in WSL..."
+        _JOB.package_current = _PHASE_LABELS["preflight"]
+        _JOB.package_index = 0
+        _JOB.package_total = len(_INSTALL_PHASES)
+        _JOB.percent = 0.0
+        _JOB.target_dir = _WSL_VLLM_VENV_PATH
+        _JOB.error = None
+        _JOB.started_at = time.time()
+        _JOB.finished_at = 0.0
+        _JOB.attempts = []
+        _JOB.done = False
+
+        thread = threading.Thread(
+            target=_job_worker,
+            name="chaosengine-vllm-wsl-install",
+            daemon=True,
+        )
+        thread.start()
+
+    return _JOB.to_dict()
+
+
+@router.get("/api/setup/install-vllm-wsl/status")
+def vllm_wsl_status() -> dict[str, Any]:
+    """Snapshot of the most-recent WSL vLLM install attempt.
+
+    Safe to poll at 1-2 Hz. Returns ``phase="idle"`` before any
+    install has been started in this backend session.
+    """
+    return _JOB.to_dict()
diff --git a/src/api/index.ts b/src/api/index.ts
index a3930fd..a376c63 100644
--- a/src/api/index.ts
+++ b/src/api/index.ts
@@ -503,6 +503,7 @@ export {
   getMtplxInstallStatus,
   getMtplxStatus,
   getTorchUpgradeStatus,
+  getVllmWslInstallStatus,
   getWanInstallStatus,
   getWanInventory,
   installCudaTorch,
@@ -513,6 +514,7 @@ export {
   startLongLiveInstall,
   startMtplxInstall,
   startTorchUpgrade,
+  startVllmWslInstall,
   startWanInstall,
 } from "./setup";
 export type {
@@ -535,6 +537,8 @@ export type {
   TorchUpgradeType,
   TorchUpgradeUnavailableReason,
   TurboUpdateInfo,
+  VllmWslAttempt,
+  VllmWslJobState,
   WanConvertStatusFields,
   WanInstallAttempt,
   WanInstallJobState,
diff --git a/src/api/setup.ts b/src/api/setup.ts
index a264876..fd610a7 100644
--- a/src/api/setup.ts
+++ b/src/api/setup.ts
@@ -176,6 +176,52 @@ export async function getLongLiveInstallStatus(): Promise<LongLiveJobState> {
   return await fetchJson<LongLiveJobState>("/api/setup/install-longlive/status", 10000);
 }
 
+// ---------------------------------------------------------------------------
+// FU-056 Phase 8: vLLM-in-WSL install (Windows hosts only)
+//
+// Same background-job shape as LongLiveJobState so the existing
+// InstallLogPanel renders it without modification. The backend
+// endpoint is gated on ``sys.platform == 'win32'`` and rejects with
+// HTTP 400 on macOS / Linux — callers should gate the UI on
+// ``nativeBackends.wsl2Available`` rather than letting the user POST.
+// ---------------------------------------------------------------------------
+
+export interface VllmWslAttempt {
+  phase: string;
+  package: string;
+  ok: boolean;
+  output: string;
+  // Always undefined for vllm-wsl attempts — declared so the shared
+  // ``InstallLogPanel`` reads it on the discriminated union without a
+  // per-job branch. Same shape carrier the LongLive / MTPLX attempts
+  // use.
+  indexUrl?: string;
+}
+
+export interface VllmWslJobState {
+  id: string;
+  phase: "idle" | "preflight" | "installing" | "done" | "error";
+  message: string;
+  packageCurrent: string | null;
+  packageIndex: number;
+  packageTotal: number;
+  percent: number;
+  targetDir: string | null;
+  error: string | null;
+  startedAt: number;
+  finishedAt: number;
+  attempts: VllmWslAttempt[];
+  done: boolean;
+}
+
+export async function startVllmWslInstall(): Promise<VllmWslJobState> {
+  return await postJson<VllmWslJobState>("/api/setup/install-vllm-wsl", {}, 15000);
+}
+
+export async function getVllmWslInstallStatus(): Promise<VllmWslJobState> {
+  return await fetchJson<VllmWslJobState>("/api/setup/install-vllm-wsl/status", 10000);
+}
+
 // ---------------------------------------------------------------------------
 // mlx-video Wan install (FU-025) — Apple Silicon only
 // ---------------------------------------------------------------------------
diff --git a/src/components/InstallLogPanel.tsx b/src/components/InstallLogPanel.tsx
index b1a508f..04b9323 100644
--- a/src/components/InstallLogPanel.tsx
+++ b/src/components/InstallLogPanel.tsx
@@ -1,14 +1,18 @@
 import { useEffect, useRef } from "react";
 import { useTranslation } from "react-i18next";
 import type { TFunction } from "i18next";
-import type { GpuBundleJobState, LongLiveJobState, MtplxJobState } from "../api";
-
-// The panel renders any background install job — GPU bundle, LongLive, or
-// MTPLX. All share the core fields (phase / message / attempts / progress
-// counters / targetDir). Treating the prop as a union keeps all surfaces
-// using one component without duplicating auto-scroll, pip-noise filter, and
-// terminal layout.
-export type InstallJobState = GpuBundleJobState | LongLiveJobState | MtplxJobState;
+import type { GpuBundleJobState, LongLiveJobState, MtplxJobState, VllmWslJobState } from "../api";
+
+// The panel renders any background install job — GPU bundle, LongLive,
+// MTPLX, or WSL vLLM. All share the core fields (phase / message /
+// attempts / progress counters / targetDir). Treating the prop as a
+// union keeps all surfaces using one component without duplicating
+// auto-scroll, pip-noise filter, and terminal layout.
+export type InstallJobState =
+  | GpuBundleJobState
+  | LongLiveJobState
+  | MtplxJobState
+  | VllmWslJobState;
 
 // Optional fields read by the meta line. ``GpuBundleJobState`` has these;
 // ``LongLiveJobState`` doesn't. Centralised here so the meta renderer
@@ -25,7 +29,7 @@ interface InstallLogPanelProps {
   job: InstallJobState | null;
   // Title shown in the collapsed summary. Defaults to the GPU bundle
   // wording so existing call sites don't need to pass it.
-  variant?: "gpu-bundle" | "longlive" | "mtplx";
+  variant?: "gpu-bundle" | "longlive" | "mtplx" | "vllm-wsl";
 }
 
 // Single scrollable terminal rendering the GPU bundle install progress.
@@ -100,11 +104,13 @@ function InstallLogMeta({ job, t }: { job: InstallJobState; t: TFunction }) {
   return <div className="install-log-meta">{fragments.join(" · ")}</div>;
 }
 
-function formatStatusLabel(job: InstallJobState, variant: "gpu-bundle" | "longlive" | "mtplx", t: TFunction): string {
+function formatStatusLabel(job: InstallJobState, variant: "gpu-bundle" | "longlive" | "mtplx" | "vllm-wsl", t: TFunction): string {
   const noun = variant === "longlive"
     ? t("installLog.statusNoun.longlive", { defaultValue: "LongLive install" })
     : variant === "mtplx"
     ? t("installLog.statusNoun.mtplx", { defaultValue: "MTPLX install" })
+    : variant === "vllm-wsl"
+    ? t("installLog.statusNoun.vllmWsl", { defaultValue: "vLLM-in-WSL install" })
     : t("installLog.statusNoun.gpuBundle", { defaultValue: "Install" });
   if (job.phase === "error" || job.error) return t("installLog.status.failed", { noun, defaultValue: `${noun} failed — see log` });
   if (job.phase === "done") return t("installLog.status.complete", { noun, defaultValue: `${noun} complete — see log` });
diff --git a/src/features/settings/DiagnosticsPanel.tsx b/src/features/settings/DiagnosticsPanel.tsx
index 41da875..c3b0907 100644
--- a/src/features/settings/DiagnosticsPanel.tsx
+++ b/src/features/settings/DiagnosticsPanel.tsx
@@ -13,6 +13,7 @@ import {
   type StorageTopResponse,
 } from "../../api";
 import { AcceleratorsBoostPack } from "./AcceleratorsBoostPack";
+import { WslBridgePanel } from "./WslBridgePanel";
 
 // In-app troubleshooting panel. Surfaces OS, hardware, runtime paths,
 // GPU state, env vars, and the backend log tail without asking users to
@@ -415,6 +416,7 @@ export function DiagnosticsPanel({ backendOnline, onRestartServer, busyAction }:
         </div>
       ) : null}
       <AcceleratorsBoostPack backendOnline={backendOnline} />
+      <WslBridgePanel backendOnline={backendOnline} />
       <StorageTopSection backendOnline={backendOnline} />
     </Panel>
   );
diff --git a/src/features/settings/WslBridgePanel.tsx b/src/features/settings/WslBridgePanel.tsx
new file mode 100644
index 0000000..637c4d3
--- /dev/null
+++ b/src/features/settings/WslBridgePanel.tsx
@@ -0,0 +1,303 @@
+import { useCallback, useEffect, useRef, useState } from "react";
+
+import { InstallLogPanel } from "../../components/InstallLogPanel";
+import {
+  getVllmWslInstallStatus,
+  refreshCapabilities,
+  startVllmWslInstall,
+  type VllmWslJobState,
+} from "../../api";
+import type { NativeBackendStatus } from "../../types/server";
+
+/**
+ * Windows-only Setup panel that surfaces the WSL2 vLLM bridge state
+ * + one-click installer (FU-056 Phase 8).
+ *
+ * vLLM ships no native Windows wheels, so users on RTX Windows boxes
+ * can't get to the vLLM lane without dropping to PowerShell. This
+ * panel makes the install one click: behind the scenes it spawns a
+ * background job that creates an isolated venv inside the user's
+ * default WSL distro at ``~/.chaosengine/vllm-venv``, pip-installs
+ * vllm (~2 GB), and verifies the import works.
+ *
+ * Five state buckets:
+ *
+ *   1. **Not Windows** — render nothing. The caller already gates by
+ *      ``platform === "win32"`` but the bail here is defensive.
+ *   2. **WSL2 not installed** — surface the official ``wsl --install``
+ *      command with a copy-paste hint + a link to Microsoft's docs.
+ *      The user reboots, re-opens ChaosEngineAI, the panel flips to
+ *      bucket 3.
+ *   3. **WSL2 ready, CUDA not visible inside WSL** — the NVIDIA WSL
+ *      driver kicker isn't installed on the Windows host. Surface a
+ *      link to the NVIDIA WSL guide; we can't install drivers from
+ *      inside the app.
+ *   4. **WSL2 + CUDA ready, vLLM not installed** — the install
+ *      button. Background-job pattern (start → poll status) same as
+ *      LongLive / GPU bundle.
+ *   5. **vLLM ready** — green pill with the version. The install
+ *      button collapses to "Reinstall" so a user who hit a half-baked
+ *      build can recover without dropping to PowerShell.
+ *
+ * Self-contained: probes capabilities on mount, polls install status
+ * at 1.5 Hz when a job is in flight, refreshes capabilities on
+ * completion so the parent's workspace state catches up.
+ */
+
+export interface WslBridgePanelProps {
+  /** Set false until backend health check has cleared. Probe + install
+   * both need the backend up. */
+  backendOnline: boolean;
+}
+
+const POLL_INTERVAL_MS = 1500;
+// Pulled out as a constant so the link in the "WSL2 not installed"
+// bucket points at the live Microsoft doc page rather than burying
+// the URL inline. Bump when MS retires this page (unlikely soon).
+const WSL_INSTALL_DOCS_URL = "https://learn.microsoft.com/en-us/windows/wsl/install";
+const NVIDIA_WSL_DOCS_URL = "https://docs.nvidia.com/cuda/wsl-user-guide/";
+
+export function WslBridgePanel({ backendOnline }: WslBridgePanelProps) {
+  const [caps, setCaps] = useState<NativeBackendStatus | null>(null);
+  const [capsError, setCapsError] = useState<string | null>(null);
+  const [job, setJob] = useState<VllmWslJobState | null>(null);
+  const [starting, setStarting] = useState(false);
+  const pollTimer = useRef<number | null>(null);
+
+  // ----- One-shot capability probe on mount / backend-online toggle -----
+  const probe = useCallback(async () => {
+    if (!backendOnline) return;
+    try {
+      const next = await refreshCapabilities();
+      setCaps(next as unknown as NativeBackendStatus);
+      setCapsError(null);
+    } catch (err) {
+      setCapsError(err instanceof Error ? err.message : String(err));
+    }
+  }, [backendOnline]);
+
+  useEffect(() => {
+    void probe();
+  }, [probe]);
+
+  // ----- Install status poll loop -----
+  // Polls at 1.5 Hz while the job is in flight; stops on done / error.
+  // Re-probes capabilities on completion so the bucket switches without
+  // a parent refetch.
+  useEffect(() => {
+    if (!job) return;
+    if (job.done || job.phase === "done" || job.phase === "error") {
+      if (pollTimer.current) {
+        window.clearInterval(pollTimer.current);
+        pollTimer.current = null;
+      }
+      if (job.phase === "done") {
+        void probe();
+      }
+      return;
+    }
+    if (pollTimer.current) return;
+    pollTimer.current = window.setInterval(() => {
+      void (async () => {
+        try {
+          const next = await getVllmWslInstallStatus();
+          setJob(next);
+        } catch {
+          // Best-effort poll; swallow errors so we don't disrupt the UI.
+          // The next tick will retry.
+        }
+      })();
+    }, POLL_INTERVAL_MS);
+    return () => {
+      if (pollTimer.current) {
+        window.clearInterval(pollTimer.current);
+        pollTimer.current = null;
+      }
+    };
+  }, [job, probe]);
+
+  const handleInstall = useCallback(async () => {
+    if (starting) return;
+    setStarting(true);
+    try {
+      const next = await startVllmWslInstall();
+      setJob(next);
+    } catch (err) {
+      const message = err instanceof Error ? err.message : String(err);
+      // Synthesize an error job so the InstallLogPanel can surface the
+      // failure even when the start endpoint itself bailed (e.g. user
+      // managed to click on a non-Windows host through a stale UI).
+      setJob({
+        id: "vllm-wsl-error",
+        phase: "error",
+        message,
+        packageCurrent: null,
+        packageIndex: 0,
+        packageTotal: 0,
+        percent: 0,
+        targetDir: null,
+        error: message,
+        startedAt: Date.now() / 1000,
+        finishedAt: Date.now() / 1000,
+        attempts: [],
+        done: true,
+      });
+    } finally {
+      setStarting(false);
+    }
+  }, [starting]);
+
+  // ----- Bucket selection -----
+  // The panel itself only renders on Windows; macOS / Linux callers
+  // shouldn't see it. Bail defensively if a caller forgets to gate.
+  const isWindows = typeof navigator !== "undefined"
+    && navigator.userAgent.toLowerCase().includes("windows");
+  if (!isWindows) return null;
+
+  if (!backendOnline) {
+    return (
+      <section className="wsl-bridge-panel" style={{ marginTop: 18 }}>
+        <header>
+          <strong style={{ fontSize: "0.95rem" }}>WSL2 vLLM bridge</strong>
+        </header>
+        <p className="muted-text" style={{ fontSize: "0.84rem", margin: "4px 0 0" }}>
+          Backend offline — start the sidecar to probe WSL state.
+        </p>
+      </section>
+    );
+  }
+
+  if (capsError) {
+    return (
+      <section className="wsl-bridge-panel" style={{ marginTop: 18 }}>
+        <header>
+          <strong style={{ fontSize: "0.95rem" }}>WSL2 vLLM bridge</strong>
+        </header>
+        <p className="muted-text" style={{ color: "rgb(252, 165, 165)", fontSize: "0.82rem", margin: "4px 0 0" }}>
+          Could not read WSL state: {capsError}
+        </p>
+      </section>
+    );
+  }
+
+  if (!caps) {
+    return (
+      <section className="wsl-bridge-panel" style={{ marginTop: 18 }}>
+        <header>
+          <strong style={{ fontSize: "0.95rem" }}>WSL2 vLLM bridge</strong>
+        </header>
+        <p className="muted-text" style={{ fontSize: "0.84rem", margin: "4px 0 0" }}>
+          Probing WSL state...
+        </p>
+      </section>
+    );
+  }
+
+  const wsl2 = caps.wsl2Available === true;
+  const wslCuda = caps.wslCudaAvailable === true;
+  const vllmInstalled = caps.wslVllmAvailable === true;
+  const distro = caps.wslDistroName ?? "WSL";
+  const vllmVersion = caps.wslVllmVersion ?? null;
+
+  // Common chrome — same header on every bucket so the panel reads as
+  // one section the user can find by name.
+  const header = (
+    <header className="wsl-bridge-panel-header" style={{ display: "flex", alignItems: "baseline", gap: 8 }}>
+      <strong style={{ fontSize: "0.95rem" }}>WSL2 vLLM bridge</strong>
+      {wsl2 && wslCuda && vllmInstalled ? (
+        <span
+          className="badge subtle"
+          style={{
+            background: "rgba(80, 180, 100, 0.22)",
+            color: "#8fd99e",
+            padding: "2px 8px",
+            borderRadius: 10,
+            fontSize: "0.72rem",
+            fontWeight: 600,
+          }}
+        >
+          ✓ Ready{vllmVersion ? ` · v${vllmVersion}` : ""}
+        </span>
+      ) : null}
+    </header>
+  );
+
+  // Bucket 1: WSL2 not installed at all.
+  if (!wsl2) {
+    return (
+      <section className="wsl-bridge-panel" style={{ marginTop: 18 }}>
+        {header}
+        <p className="muted-text" style={{ fontSize: "0.84rem", margin: "6px 0 0" }}>
+          WSL2 isn't installed on this Windows host. vLLM ships no native
+          Windows wheels, so the practical path is to run vLLM inside a
+          WSL2 Ubuntu distro.
+        </p>
+        <p className="muted-text" style={{ fontSize: "0.82rem", margin: "8px 0 0" }}>
+          Open an admin PowerShell and run:
+        </p>
+        <pre style={{
+          margin: "4px 0",
+          padding: "8px 10px",
+          background: "rgba(0, 0, 0, 0.35)",
+          borderRadius: 6,
+          fontSize: "0.8rem",
+          fontFamily: "ui-monospace, SFMono-Regular, Menlo, Consolas, monospace",
+        }}>wsl --install</pre>
+        <p className="muted-text" style={{ fontSize: "0.78rem", margin: 0 }}>
+          Reboot when prompted, then reopen ChaosEngineAI.{" "}
+          <a href={WSL_INSTALL_DOCS_URL} target="_blank" rel="noreferrer">Microsoft docs</a>
+        </p>
+      </section>
+    );
+  }
+
+  // Bucket 2: WSL2 up but CUDA not reachable inside it.
+  if (!wslCuda) {
+    return (
+      <section className="wsl-bridge-panel" style={{ marginTop: 18 }}>
+        {header}
+        <p className="muted-text" style={{ fontSize: "0.84rem", margin: "6px 0 0" }}>
+          WSL2 is installed ({distro}), but{" "}
+          <code style={{ fontSize: "0.82em" }}>nvidia-smi</code> isn't
+          reachable inside the distro. Install the NVIDIA WSL driver
+          kicker on Windows — it lets CUDA passthrough from your GPU
+          driver into WSL.
+        </p>
+        <p className="muted-text" style={{ fontSize: "0.78rem", margin: "8px 0 0" }}>
+          <a href={NVIDIA_WSL_DOCS_URL} target="_blank" rel="noreferrer">NVIDIA WSL guide ↗</a>
+        </p>
+      </section>
+    );
+  }
+
+  // Bucket 3+: ready to install / already installed.
+  const installRunning = job != null
+    && (job.phase === "preflight" || job.phase === "installing");
+  const buttonLabel = installRunning
+    ? job?.message || "Installing..."
+    : vllmInstalled
+      ? "Reinstall vLLM in WSL"
+      : "Install vLLM in WSL";
+
+  return (
+    <section className="wsl-bridge-panel" style={{ marginTop: 18 }}>
+      {header}
+      <p className="muted-text" style={{ fontSize: "0.84rem", margin: "6px 0 0" }}>
+        {vllmInstalled
+          ? `vLLM ${vllmVersion ?? ""} is installed in ${distro} at ~/.chaosengine/vllm-venv. The desktop app can route vLLM model loads through this venv.`
+          : `WSL2 (${distro}) + CUDA passthrough are ready. Install vLLM into an isolated venv at ~/.chaosengine/vllm-venv (~2 GB download, ~5-15 min on a warm box).`}
+      </p>
+      <div className="button-row" style={{ marginTop: 10 }}>
+        <button
+          type="button"
+          className="secondary-button"
+          onClick={() => void handleInstall()}
+          disabled={starting || installRunning}
+        >
+          {buttonLabel}
+        </button>
+      </div>
+      <InstallLogPanel job={job} variant="vllm-wsl" />
+    </section>
+  );
+}
diff --git a/src/types/server.ts b/src/types/server.ts
index 3d1c1e5..b812da3 100644
--- a/src/types/server.ts
+++ b/src/types/server.ts
@@ -145,5 +145,14 @@ export interface NativeBackendStatus {
   kvpressAvailable?: boolean;
   kvpressVersion?: string | null;
   wsl2Available?: boolean;
+  // FU-056 Phase 8: WSL2 vLLM bridge state. ``wslDistroName`` is the
+  // default-distro name from ``wsl --status`` (e.g. "Ubuntu-24.04"),
+  // ``wslCudaAvailable`` is true iff ``nvidia-smi -L`` works inside
+  // WSL, ``wslVllmAvailable`` is true iff the managed venv at
+  // ``~/.chaosengine/vllm-venv`` can import vllm.
+  wslDistroName?: string | null;
+  wslCudaAvailable?: boolean;
+  wslVllmAvailable?: boolean;
+  wslVllmVersion?: string | null;
   probing?: boolean;
 }
diff --git a/tests/test_accelerator_capabilities.py b/tests/test_accelerator_capabilities.py
index a9c00dc..3dc9388 100644
--- a/tests/test_accelerator_capabilities.py
+++ b/tests/test_accelerator_capabilities.py
@@ -168,6 +168,101 @@ def test_returns_false_on_subprocess_timeout(self):
                 self.assertFalse(accelerators.wsl2_available())
 
 
+class WslDetailProbeTests(unittest.TestCase):
+    """FU-056 Phase 8: WSL2 + vLLM-bridge detail probes. All four
+    return safely-default values off Windows so the capability layer
+    never throws on a macOS / Linux host."""
+
+    def test_default_distro_off_windows_returns_none(self):
+        with patch.object(accelerators.sys, "platform", "linux"):
+            self.assertIsNone(accelerators.wsl_default_distro())
+
+    def test_default_distro_parses_status_output(self):
+        # ``wsl --status`` emits UTF-16 LE. Synthesize that shape so
+        # the decoder is exercised.
+        status_text = (
+            "Default Distribution: Ubuntu-24.04\r\n"
+            "Default Version: 2\r\n"
+        )
+        fake_result = MagicMock(
+            returncode=0,
+            stdout=status_text.encode("utf-16-le"),
+        )
+        with patch.object(accelerators.sys, "platform", "win32"):
+            with patch.object(accelerators.subprocess, "run", return_value=fake_result):
+                self.assertEqual(accelerators.wsl_default_distro(), "Ubuntu-24.04")
+
+    def test_default_distro_returns_none_when_no_default_line(self):
+        fake_result = MagicMock(
+            returncode=0,
+            stdout="Default Version: 2\r\n".encode("utf-16-le"),
+        )
+        with patch.object(accelerators.sys, "platform", "win32"):
+            with patch.object(accelerators.subprocess, "run", return_value=fake_result):
+                self.assertIsNone(accelerators.wsl_default_distro())
+
+    def test_default_distro_returns_none_when_wsl_exits_nonzero(self):
+        fake_result = MagicMock(returncode=1, stdout=b"")
+        with patch.object(accelerators.sys, "platform", "win32"):
+            with patch.object(accelerators.subprocess, "run", return_value=fake_result):
+                self.assertIsNone(accelerators.wsl_default_distro())
+
+    def test_cuda_available_off_windows_returns_false(self):
+        with patch.object(accelerators.sys, "platform", "darwin"):
+            self.assertFalse(accelerators.wsl_cuda_available())
+
+    def test_cuda_available_true_when_nvidia_smi_lists_gpu(self):
+        fake_result = MagicMock(
+            returncode=0,
+            stdout=b"GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-...)\n",
+        )
+        with patch.object(accelerators.sys, "platform", "win32"):
+            with patch.object(accelerators.subprocess, "run", return_value=fake_result):
+                self.assertTrue(accelerators.wsl_cuda_available())
+
+    def test_cuda_available_false_when_nvidia_smi_returns_empty(self):
+        fake_result = MagicMock(returncode=0, stdout=b"")
+        with patch.object(accelerators.sys, "platform", "win32"):
+            with patch.object(accelerators.subprocess, "run", return_value=fake_result):
+                self.assertFalse(accelerators.wsl_cuda_available())
+
+    def test_cuda_available_false_when_nvidia_smi_missing(self):
+        fake_result = MagicMock(returncode=127, stdout=b"")
+        with patch.object(accelerators.sys, "platform", "win32"):
+            with patch.object(accelerators.subprocess, "run", return_value=fake_result):
+                self.assertFalse(accelerators.wsl_cuda_available())
+
+    def test_vllm_available_off_windows_returns_false(self):
+        with patch.object(accelerators.sys, "platform", "linux"):
+            self.assertFalse(accelerators.wsl_vllm_available())
+
+    def test_vllm_available_true_when_import_returns_zero(self):
+        fake_result = MagicMock(returncode=0, stdout=b"", stderr=b"")
+        with patch.object(accelerators.sys, "platform", "win32"):
+            with patch.object(accelerators.subprocess, "run", return_value=fake_result):
+                self.assertTrue(accelerators.wsl_vllm_available())
+
+    def test_vllm_available_false_when_import_fails(self):
+        fake_result = MagicMock(returncode=1, stdout=b"", stderr=b"ModuleNotFoundError")
+        with patch.object(accelerators.sys, "platform", "win32"):
+            with patch.object(accelerators.subprocess, "run", return_value=fake_result):
+                self.assertFalse(accelerators.wsl_vllm_available())
+
+    def test_vllm_version_returns_none_when_unavailable(self):
+        with patch.object(accelerators, "wsl_vllm_available", return_value=False):
+            self.assertIsNone(accelerators.wsl_vllm_version())
+
+    def test_vllm_version_reads_stdout_when_available(self):
+        # Two-shot: ``wsl_vllm_available`` runs the import-check
+        # subprocess, then ``wsl_vllm_version`` runs a second subprocess
+        # to read ``__version__``. We stub the version-fetch result.
+        fake_version_result = MagicMock(returncode=0, stdout=b"0.6.3\n", stderr=b"")
+        with patch.object(accelerators, "wsl_vllm_available", return_value=True):
+            with patch.object(accelerators.sys, "platform", "win32"):
+                with patch.object(accelerators.subprocess, "run", return_value=fake_version_result):
+                    self.assertEqual(accelerators.wsl_vllm_version(), "0.6.3")
+
+
 class BackendCapabilitiesToDictTests(unittest.TestCase):
     """The frontend reads accelerator flags via ``/api/health``. Pin
     the serialized payload so a future field rename (or a forgetful
diff --git a/tests/test_vllm_wsl_install.py b/tests/test_vllm_wsl_install.py
new file mode 100644
index 0000000..c4429ba
--- /dev/null
+++ b/tests/test_vllm_wsl_install.py
@@ -0,0 +1,128 @@
+"""Tests for FU-056 Phase 8 install-vllm-wsl endpoints.
+
+The install itself runs in a background thread + shells out to
+``wsl --``, which we don't want to actually execute under pytest
+(the wsl subprocess can take 5-15 min on a real host). These tests
+pin the route contract + the platform gate so the endpoint can't
+silently regress shape.
+"""
+
+from __future__ import annotations
+
+import sys
+import unittest
+from unittest.mock import patch
+
+from fastapi import FastAPI
+from fastapi.testclient import TestClient
+
+from backend_service.routes.setup.vllm_wsl import (
+    _INSTALL_PHASES,
+    _JOB,
+    _VllmWslJobState,
+    router as vllm_wsl_router,
+)
+
+
+def _make_app() -> FastAPI:
+    app = FastAPI()
+    app.include_router(vllm_wsl_router)
+    return app
+
+
+class VllmWslJobStateShapeTests(unittest.TestCase):
+    def test_to_dict_exposes_install_panel_fields(self):
+        # The shared InstallLogPanel reads these keys; pin them so a
+        # backend refactor can't silently break the frontend renderer.
+        state = _VllmWslJobState()
+        payload = state.to_dict()
+        for key in (
+            "id",
+            "phase",
+            "message",
+            "packageCurrent",
+            "packageIndex",
+            "packageTotal",
+            "percent",
+            "targetDir",
+            "error",
+            "startedAt",
+            "finishedAt",
+            "attempts",
+            "done",
+        ):
+            self.assertIn(key, payload, f"{key} missing from to_dict()")
+
+    def test_phases_match_documented_step_order(self):
+        # Five user-visible steps: preflight (CUDA check), venv,
+        # pip-upgrade, pip-vllm (the long one), verify (import works).
+        self.assertEqual(
+            _INSTALL_PHASES,
+            ("preflight", "venv", "pip-upgrade", "pip-vllm", "verify"),
+        )
+
+
+class VllmWslEndpointTests(unittest.TestCase):
+    """The POST endpoint rejects on non-Windows hosts with HTTP 400.
+    Status GET is always allowed so polling works after a Windows
+    user runs the install, even if their dev box accidentally swaps
+    platforms mid-poll."""
+
+    def setUp(self) -> None:
+        self.client = TestClient(_make_app())
+        # Reset the singleton job so test ordering doesn't matter.
+        # The module-level state is the only contract the frontend
+        # talks to — leaving "done" state from a previous test would
+        # taint the next start.
+        _JOB.id = ""
+        _JOB.phase = "idle"
+        _JOB.message = ""
+        _JOB.package_current = None
+        _JOB.package_index = 0
+        _JOB.package_total = len(_INSTALL_PHASES)
+        _JOB.percent = 0.0
+        _JOB.target_dir = None
+        _JOB.error = None
+        _JOB.started_at = 0.0
+        _JOB.finished_at = 0.0
+        _JOB.attempts = []
+        _JOB.done = False
+
+    def test_post_rejects_off_windows(self):
+        with patch.object(sys, "platform", "linux"):
+            response = self.client.post("/api/setup/install-vllm-wsl")
+        self.assertEqual(response.status_code, 400)
+        body = response.json()
+        # ``localized_detail`` wraps the message in {message, localized,
+        # locale} — the user-facing string is in ``message``.
+        detail = body.get("detail")
+        self.assertIsInstance(detail, dict)
+        self.assertIn("Windows", detail.get("message", ""))
+
+    def test_status_returns_idle_when_no_install_started(self):
+        response = self.client.get("/api/setup/install-vllm-wsl/status")
+        self.assertEqual(response.status_code, 200)
+        payload = response.json()
+        self.assertEqual(payload["phase"], "idle")
+        self.assertFalse(payload["done"])
+
+    def test_post_on_windows_returns_job_state(self):
+        # Patch the background worker so the test doesn't actually shell
+        # out to ``wsl --`` (which would hang or fail on CI). The thread
+        # still starts but the worker function is a no-op.
+        from backend_service.routes.setup import vllm_wsl as module
+
+        with patch.object(sys, "platform", "win32"):
+            with patch.object(module, "_job_worker", lambda: None):
+                response = self.client.post("/api/setup/install-vllm-wsl")
+
+        self.assertEqual(response.status_code, 200)
+        payload = response.json()
+        self.assertEqual(payload["phase"], "preflight")
+        self.assertEqual(payload["packageTotal"], len(_INSTALL_PHASES))
+        self.assertEqual(payload["targetDir"], "~/.chaosengine/vllm-venv")
+        self.assertTrue(payload["id"].startswith("vllm-wsl-"))
+
+
+if __name__ == "__main__":
+    unittest.main()

From b0f00ca14fa73f554f664f66ac8729b620d402bb Mon Sep 17 00:00:00 2001
From: Cryptopoly <31970407+cryptopoly@users.noreply.github.com>
Date: Sun, 17 May 2026 17:21:41 +0100
Subject: [PATCH 09/15] feat: vLLM WSL bridge engine + routing (FU-056 Phase 8
 follow-up)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Completes the WSL bridge so Windows users get transparent vLLM
inference. A model load with backend=vllm on Windows + wslVllm
installed transparently spawns the OpenAI-compatible server inside
the WSL Ubuntu venv and proxies /v1/chat/completions through it.
No user action beyond clicking "Install vLLM in WSL" once.

Three pieces:

1. **VllmWslEngine** (backend_service/inference/vllm_wsl_engine.py):
   HTTP-bridge engine modelled on MtplxEngine. Subprocess shape:
       wsl -- ~/.chaosengine/vllm-venv/bin/python
           -m vllm.entrypoints.openai.api_server
           --model <ref> --host 127.0.0.1 --port <free>
           --max-model-len <ctx> --trust-remote-code

   WSL2 mirrors loopback to the Windows host so the Windows backend
   reaches the listener at 127.0.0.1:<port> without any port-forward
   ceremony. Implements both generate() and stream_generate() so the
   existing chat surface stream path works end to end.

2. **windows_path_to_wsl helper**: a local model at
   C:\Users\Dan\AI_Models\Qwen3-7B gets translated to
   /mnt/c/Users/Dan/AI_Models/Qwen3-7B before being passed to vLLM,
   so a Windows-side download is reachable from inside WSL. HF repo
   ids (Qwen/Qwen3.5-7B) pass through unchanged - vLLM downloads them
   into its WSL-native HF cache, which avoids the ~10x IO penalty
   of /mnt/c-based cache reads.

3. **Routing** (backend_service/inference/controller.py): when
   ``hint == "vllm"`` the controller now prefers VllmWslEngine on
   Windows + wslVllmAvailable=True, falling through to the in-process
   VLLMEngine on Linux. On Windows boxes without the bridge, the
   error message points the user at Diagnostics → WSL2 vLLM bridge
   instead of the bare "pip install vllm" hint that doesn't work on
   Windows.

Speculative decoding via the WSL bridge isn't wired yet - the
in-process VLLMEngine uses vllm.LLM's speculative_config= kwarg, but
the OpenAI server entry-point uses --speculative-model /
--num-speculative-tokens which need separate wiring. The runtime
note honestly flags the gap rather than silently dropping requests.

Tests: 13 new in test_vllm_wsl_engine.py covering:
  - windows_path_to_wsl matrix (backslash, forward-slash, drive
    casing, WSL passthrough, repo-id passthrough, UNC, relative)
  - load_model platform gate (off-Windows rejects)
  - load_model capability gate (wslVllm missing rejects)
  - argv composition (every required vllm flag present + ordered)
  - happy-path lifecycle (Popen called once, /health polled,
    LoadedModelInfo populated correctly, pid reachable)
  - path translation on a Windows model path
---
 backend_service/inference/controller.py      |  19 +
 backend_service/inference/vllm_wsl_engine.py | 540 +++++++++++++++++++
 tests/test_vllm_wsl_engine.py                | 282 ++++++++++
 3 files changed, 841 insertions(+)
 create mode 100644 backend_service/inference/vllm_wsl_engine.py
 create mode 100644 tests/test_vllm_wsl_engine.py

diff --git a/backend_service/inference/controller.py b/backend_service/inference/controller.py
index 88ce505..576c4b2 100644
--- a/backend_service/inference/controller.py
+++ b/backend_service/inference/controller.py
@@ -522,9 +522,28 @@ def _select_engine(
                 "Install with: brew install llama.cpp"
             )
         if hint == "vllm":
+            # FU-056 Phase 8: prefer the WSL bridge on Windows hosts.
+            # vLLM ships no native Windows wheels, so the in-process
+            # ``VLLMEngine`` can't import vllm on Windows — the WSL
+            # bridge engine spawns vLLM's OpenAI server inside the
+            # managed venv at ``~/.chaosengine/vllm-venv`` and proxies
+            # the HTTP surface. On Linux + CUDA the native
+            # ``VLLMEngine`` stays preferred (no subprocess overhead).
+            import sys
+            if sys.platform == "win32" and self.capabilities.wslVllmAvailable:
+                from backend_service.inference.vllm_wsl_engine import VllmWslEngine
+                return VllmWslEngine(self.capabilities)
             if self.capabilities.vllmAvailable:
                 from backend_service.vllm_engine import VLLMEngine
                 return VLLMEngine(self.capabilities)
+            # Neither route works. Tailor the error to the platform —
+            # Windows users get the WSL bridge hint, Linux users get
+            # the pip-install hint.
+            if sys.platform == "win32":
+                raise RuntimeError(
+                    "vLLM backend requested but not installed. On Windows, "
+                    "install vLLM into WSL via Diagnostics → WSL2 vLLM bridge."
+                )
             raise RuntimeError(
                 "vLLM backend requested but not installed. "
                 "Install with: pip install vllm (Linux + CUDA only)."
diff --git a/backend_service/inference/vllm_wsl_engine.py b/backend_service/inference/vllm_wsl_engine.py
new file mode 100644
index 0000000..0d70e3c
--- /dev/null
+++ b/backend_service/inference/vllm_wsl_engine.py
@@ -0,0 +1,540 @@
+"""vLLM-in-WSL inference engine (FU-056 Phase 8 follow-up).
+
+vLLM ships no native Windows wheels. Windows users with the WSL2
+bridge installed (Phase 8 foundation) get vLLM access through this
+engine: it spawns vLLM's OpenAI-compatible HTTP server inside the
+WSL Ubuntu venv, then proxies ``/v1/chat/completions`` from the
+Windows-side backend.
+
+Architecturally identical to ``MtplxEngine`` (subprocess + HTTP
+proxy + ``_wait_for_server`` poll loop) but the command prefix is
+``wsl -- ~/.chaosengine/vllm-venv/bin/python -m
+vllm.entrypoints.openai.api_server`` instead of the host-native
+mtplx binary.
+
+WSL2 networking lets the Windows side reach the WSL listener on
+``127.0.0.1:<port>`` transparently — the loopback adapter inside
+WSL is mirrored to the Windows host's loopback by default. No
+port forwarding needed.
+
+Path translation: a model loaded from a Windows-side directory
+(e.g. ``C:\\Users\\Dan\\AI_Models\\Qwen3-7B``) needs to be
+addressed as ``/mnt/c/Users/Dan/AI_Models/Qwen3-7B`` when vLLM
+runs inside WSL. HF repo ids (``Qwen/Qwen3.5-7B``) pass through
+unchanged — vLLM downloads them into the WSL HF cache. The Windows
+HF cache is reachable via ``/mnt/c/...`` but its NTFS-on-/mnt/c
+performance is ~10× slower than the WSL ext4 home; we deliberately
+let vLLM keep its own HF cache inside WSL.
+"""
+
+from __future__ import annotations
+
+import json
+import re
+import subprocess
+import sys
+import tempfile
+import time
+import urllib.error
+import urllib.request
+from collections.abc import Callable, Iterator
+from pathlib import Path
+from typing import Any
+
+from backend_service.inference._constants import (
+    DEFAULT_LLAMA_TIMEOUT_SECONDS,
+    WORKSPACE_ROOT,
+)
+from backend_service.inference._utils import (
+    _find_open_port,
+    _http_json,
+    _normalize_message_content,
+    _now_label,
+    _read_text_tail,
+)
+from backend_service.inference.base import (
+    BackendCapabilities,
+    BaseInferenceEngine,
+    GenerationResult,
+    LoadedModelInfo,
+    StreamChunk,
+)
+
+
+# Same path the Phase 8 install endpoint writes into. Centralised
+# here so the engine + the installer never drift out of sync.
+_WSL_VLLM_VENV_PATH = "~/.chaosengine/vllm-venv"
+
+
+def windows_path_to_wsl(path: str) -> str:
+    """Translate a Windows-style path to its WSL ``/mnt/<drive>`` form.
+
+    ``C:\\Users\\Dan\\AI_Models\\Qwen3-7B`` →
+    ``/mnt/c/Users/Dan/AI_Models/Qwen3-7B``.
+
+    Pass-throughs:
+      - Already-WSL paths (start with ``/``): returned unchanged.
+      - Forward-slash Windows paths (``C:/Users/...``): handled
+        symmetrically to backslash form.
+      - Non-path strings (HF repo ids, URLs, model names): returned
+        unchanged. The detector is conservative — only translate
+        strings that look like absolute Windows paths.
+    """
+    if not path:
+        return path
+    # Already a POSIX path; the user passed a WSL-native string.
+    if path.startswith("/"):
+        return path
+    # Windows drive-letter pattern: ``X:\foo`` or ``X:/foo``. We don't
+    # translate UNC paths (``\\server\share``) — those are rare for
+    # local models and vLLM wouldn't load from them inside WSL anyway.
+    match = re.match(r"^([A-Za-z]):[\\/](.*)$", path)
+    if not match:
+        return path
+    drive = match.group(1).lower()
+    tail = match.group(2).replace("\\", "/")
+    return f"/mnt/{drive}/{tail}"
+
+
+class VllmWslEngine(BaseInferenceEngine):
+    """vLLM running inside the WSL isolated venv, proxied via HTTP.
+
+    Spawn shape: ``wsl -- <venv>/bin/python -m
+    vllm.entrypoints.openai.api_server --model X --host 127.0.0.1
+    --port N``. We talk to it through the existing ``urllib`` HTTP
+    helpers — same as MtplxEngine. The Windows process inherits all
+    the stream / generate plumbing from ``BaseInferenceEngine`` so
+    upstream callers don't need a "is this WSL?" branch.
+    """
+
+    engine_name = "vllm-wsl"
+    engine_label = "vLLM (WSL bridge)"
+
+    def __init__(self, capabilities: BackendCapabilities) -> None:
+        self.capabilities = capabilities
+        self.loaded_model: LoadedModelInfo | None = None
+        self.process: subprocess.Popen[str] | None = None
+        self.port: int | None = None
+        self.log_path: Path | None = None
+        self.log_handle: Any = None
+
+    # ------------------------------------------------------------------
+    # Internal helpers
+    # ------------------------------------------------------------------
+
+    def _server_url(self, path: str) -> str:
+        if self.port is None:
+            raise RuntimeError("vLLM WSL server is not running.")
+        return f"http://127.0.0.1:{self.port}{path}"
+
+    def _build_wsl_command(
+        self,
+        *,
+        model_arg: str,
+        port: int,
+        max_model_len: int,
+    ) -> list[str]:
+        """Compose the ``wsl -- python -m vllm.entrypoints...`` argv.
+
+        Pulled out for tests + so the comment about each flag lives
+        next to the flag rather than buried in load_model.
+        """
+        return [
+            "wsl",
+            "--",
+            f"{_WSL_VLLM_VENV_PATH}/bin/python",
+            "-m",
+            "vllm.entrypoints.openai.api_server",
+            "--model",
+            model_arg,
+            # ``--host 127.0.0.1`` keeps vLLM listening only on the
+            # loopback — WSL2 mirrors loopback to the Windows host so
+            # the Windows backend reaches it without any port-forward
+            # ceremony, and we don't expose the model to the LAN.
+            "--host",
+            "127.0.0.1",
+            "--port",
+            str(port),
+            # ``--max-model-len`` is vLLM's name for the context window
+            # cap. Defaults to whatever the model card declares, which
+            # can be too large for available VRAM. We pass through the
+            # user-selected ``contextTokens`` so the launch settings
+            # actually take effect.
+            "--max-model-len",
+            str(max_model_len),
+            # Trust the model config without prompting. vLLM's default
+            # is False, which throws ``ValueError: trust_remote_code``
+            # for repos like Qwen3-VL that ship custom modeling code.
+            "--trust-remote-code",
+        ]
+
+    def _cleanup_process(self) -> None:
+        if self.process is not None and self.process.poll() is None:
+            try:
+                self.process.terminate()
+            except (ProcessLookupError, OSError):
+                pass
+            try:
+                self.process.wait(timeout=10)
+            except subprocess.TimeoutExpired:
+                # vLLM loads big models — the graceful terminate can
+                # take a few seconds while it tears down CUDA tensors.
+                # If it's still alive after 10 s, SIGKILL.
+                try:
+                    self.process.kill()
+                except (ProcessLookupError, OSError):
+                    pass
+                try:
+                    self.process.wait(timeout=5)
+                except subprocess.TimeoutExpired:
+                    pass
+        self.process = None
+        self.port = None
+        if self.log_handle is not None:
+            try:
+                self.log_handle.close()
+            except OSError:
+                pass
+        self.log_handle = None
+
+    def process_pid(self) -> int | None:
+        if self.process is None or self.process.poll() is not None:
+            return None
+        return int(self.process.pid)
+
+    def _wait_for_server(self) -> None:
+        """Poll ``/health`` until vLLM accepts requests, or the subprocess dies.
+
+        vLLM's startup is slow (~30-90 s for a 7B model on cold cache)
+        because it builds the CUDA graph + warms KV blocks. We give it
+        the standard llama-timeout budget and surface the captured log
+        if it dies before becoming ready.
+        """
+        deadline = time.time() + DEFAULT_LLAMA_TIMEOUT_SECONDS
+        last_error = "vLLM (WSL) did not become ready."
+        while time.time() < deadline:
+            if self.process is not None and self.process.poll() is not None:
+                logs = _read_text_tail(self.log_path)
+                raise RuntimeError(logs or "vLLM (WSL) exited during startup.")
+            try:
+                _http_json(self._server_url("/health"), timeout=2.0)
+                return
+            except Exception as exc:  # noqa: BLE001 — best-effort poll
+                last_error = str(exc)
+            time.sleep(1.0)
+        logs = _read_text_tail(self.log_path)
+        raise RuntimeError(logs if logs else last_error)
+
+    # ------------------------------------------------------------------
+    # BaseInferenceEngine interface
+    # ------------------------------------------------------------------
+
+    def load_model(
+        self,
+        *,
+        model_ref: str,
+        model_name: str,
+        canonical_repo: str | None,
+        source: str,
+        backend: str,
+        path: str | None,
+        runtime_target: str | None,
+        cache_strategy: str,
+        cache_bits: int,
+        fp16_layers: int,
+        fused_attention: bool,
+        fit_model_in_memory: bool,
+        context_tokens: int,
+        speculative_decoding: bool = False,
+        tree_budget: int = 0,
+        progress_callback: Callable[[dict[str, Any]], None] | None = None,
+    ) -> LoadedModelInfo:
+        if sys.platform != "win32":
+            raise RuntimeError(
+                "vLLM WSL bridge is Windows-only. Use the native vLLM "
+                "engine on Linux."
+            )
+        if not self.capabilities.wslVllmAvailable:
+            raise RuntimeError(
+                "vLLM isn't installed in WSL. Install it from the "
+                "Diagnostics → WSL2 vLLM bridge panel."
+            )
+
+        self.unload_model()
+
+        self.port = _find_open_port()
+
+        # Pick the most precise model reference available:
+        #   1. local path (translated to /mnt/c/... if Windows-style)
+        #   2. runtime_target (catalog override)
+        #   3. canonical HF repo (vLLM downloads to its WSL HF cache)
+        #   4. model_ref (last resort — usually equal to #3)
+        if path:
+            model_arg = windows_path_to_wsl(path)
+        elif runtime_target:
+            model_arg = windows_path_to_wsl(runtime_target)
+        else:
+            model_arg = canonical_repo or model_ref
+
+        command = self._build_wsl_command(
+            model_arg=model_arg,
+            port=self.port,
+            max_model_len=context_tokens,
+        )
+
+        if progress_callback:
+            progress_callback({
+                "phase": "loading",
+                "percent": 10.0,
+                "message": f"Spawning vLLM in WSL for {model_name}...",
+            })
+
+        temp_log = tempfile.NamedTemporaryFile(
+            prefix="chaosengine-vllm-wsl-", suffix=".log", delete=False
+        )
+        temp_log.close()
+        self.log_path = Path(temp_log.name)
+        self.log_handle = self.log_path.open("a", encoding="utf-8")
+
+        self.process = subprocess.Popen(
+            command,
+            cwd=str(WORKSPACE_ROOT),
+            stdout=self.log_handle,
+            stderr=self.log_handle,
+            text=True,
+        )
+
+        try:
+            self._wait_for_server()
+        except RuntimeError:
+            self._cleanup_process()
+            raise
+
+        runtime_note = (
+            f"vLLM running inside WSL ({self.capabilities.wslDistroName or 'Ubuntu'}) "
+            f"venv at {_WSL_VLLM_VENV_PATH}."
+        )
+        if self.capabilities.wslVllmVersion:
+            runtime_note = (
+                f"vLLM {self.capabilities.wslVllmVersion} running inside WSL "
+                f"({self.capabilities.wslDistroName or 'Ubuntu'})."
+            )
+        # Speculative decoding via the WSL bridge isn't wired yet — the
+        # in-process VLLMEngine handles it via ``speculative_config=``,
+        # but the OpenAI server entry-point uses a different surface
+        # (``--speculative-model`` / ``--num-speculative-tokens``) that
+        # we'll add in a follow-up. Note the gap honestly in the
+        # runtime note rather than silently dropping the request.
+        if speculative_decoding:
+            runtime_note += (
+                " Speculative decoding requested but not yet supported "
+                "via the WSL bridge — running with standard decoding."
+            )
+
+        if progress_callback:
+            progress_callback({
+                "phase": "ready",
+                "percent": 100.0,
+                "message": "vLLM (WSL) ready.",
+            })
+
+        self.loaded_model = LoadedModelInfo(
+            ref=model_ref,
+            name=model_name,
+            canonicalRepo=canonical_repo,
+            backend=backend,
+            source=source,
+            engine=self.engine_name,
+            cacheStrategy=cache_strategy,
+            cacheBits=cache_bits,
+            fp16Layers=fp16_layers,
+            fusedAttention=fused_attention,
+            fitModelInMemory=fit_model_in_memory,
+            contextTokens=context_tokens,
+            loadedAt=_now_label(),
+            path=path,
+            runtimeTarget=model_arg,
+            runtimeNote=runtime_note,
+            speculativeDecoding=False,
+        )
+        return self.loaded_model
+
+    def unload_model(self) -> None:
+        self._cleanup_process()
+        self.loaded_model = None
+
+    def generate(
+        self,
+        *,
+        prompt: str,
+        history: list[dict[str, Any]],
+        system_prompt: str | None,
+        max_tokens: int,
+        temperature: float,
+        images: list[str] | None = None,
+        tools: list[dict[str, Any]] | None = None,
+        samplers: dict[str, Any] | None = None,
+        reasoning_effort: str | None = None,
+        json_schema: dict[str, Any] | None = None,
+    ) -> GenerationResult:
+        if self.loaded_model is None:
+            raise RuntimeError("No model is loaded.")
+        if self.process is None or self.process.poll() is not None:
+            logs = _read_text_tail(self.log_path)
+            raise RuntimeError(logs or "The vLLM (WSL) server is not running.")
+
+        messages: list[dict[str, Any]] = []
+        if system_prompt:
+            messages.append({"role": "system", "content": system_prompt})
+        for message in history:
+            role = message.get("role")
+            if role not in {"system", "user", "assistant", "tool"}:
+                continue
+            messages.append({
+                "role": role,
+                "content": _normalize_message_content(message.get("text", "")),
+            })
+        messages.append({"role": "user", "content": prompt})
+
+        started_at = time.perf_counter()
+        payload: dict[str, Any] = {
+            "model": self.loaded_model.ref,
+            "messages": messages,
+            "temperature": temperature,
+            "max_tokens": max_tokens,
+            "stream": False,
+        }
+        if tools:
+            payload["tools"] = tools
+
+        try:
+            response = _http_json(
+                self._server_url("/v1/chat/completions"),
+                payload=payload,
+                timeout=DEFAULT_LLAMA_TIMEOUT_SECONDS,
+            )
+        except urllib.error.HTTPError as exc:
+            detail = exc.read().decode("utf-8", errors="ignore")
+            raise RuntimeError(detail or str(exc)) from exc
+        except urllib.error.URLError as exc:
+            raise RuntimeError(str(exc.reason)) from exc
+
+        elapsed = max(time.perf_counter() - started_at, 1e-6)
+        choice = (response.get("choices") or [{}])[0]
+        message = choice.get("message") or {}
+        usage = response.get("usage") or {}
+        completion_tokens = int(usage.get("completion_tokens") or 0)
+        prompt_tokens = int(usage.get("prompt_tokens") or 0)
+        text = str(message.get("content") or "")
+
+        return GenerationResult(
+            text=text,
+            finishReason=str(choice.get("finish_reason") or "stop"),
+            promptTokens=prompt_tokens,
+            completionTokens=completion_tokens,
+            totalTokens=int(usage.get("total_tokens") or (prompt_tokens + completion_tokens)),
+            tokS=round(completion_tokens / elapsed, 1) if completion_tokens else 0.0,
+            responseSeconds=round(elapsed, 2),
+            runtimeNote=self.loaded_model.runtimeNote,
+        )
+
+    def stream_generate(
+        self,
+        *,
+        prompt: str,
+        history: list[dict[str, Any]],
+        system_prompt: str | None,
+        max_tokens: int,
+        temperature: float,
+        images: list[str] | None = None,
+        tools: list[dict[str, Any]] | None = None,
+        thinking_mode: str | None = None,
+        samplers: dict[str, Any] | None = None,
+        reasoning_effort: str | None = None,
+        json_schema: dict[str, Any] | None = None,
+    ) -> Iterator[StreamChunk]:
+        if self.loaded_model is None:
+            raise RuntimeError("No model is loaded.")
+        if self.process is None or self.process.poll() is not None:
+            logs = _read_text_tail(self.log_path)
+            raise RuntimeError(logs or "The vLLM (WSL) server is not running.")
+
+        messages: list[dict[str, Any]] = []
+        if system_prompt:
+            messages.append({"role": "system", "content": system_prompt})
+        for message in history:
+            role = message.get("role")
+            if role not in {"system", "user", "assistant", "tool"}:
+                continue
+            messages.append({
+                "role": role,
+                "content": _normalize_message_content(message.get("text", "")),
+            })
+        messages.append({"role": "user", "content": prompt})
+
+        payload: dict[str, Any] = {
+            "model": self.loaded_model.ref,
+            "messages": messages,
+            "temperature": temperature,
+            "max_tokens": max_tokens,
+            "stream": True,
+        }
+        if tools:
+            payload["tools"] = tools
+
+        url = self._server_url("/v1/chat/completions")
+        data = json.dumps(payload).encode("utf-8")
+        headers = {"Content-Type": "application/json", "Accept": "text/event-stream"}
+        request = urllib.request.Request(url, data=data, headers=headers, method="POST")
+        try:
+            resp = urllib.request.urlopen(request, timeout=DEFAULT_LLAMA_TIMEOUT_SECONDS)
+        except urllib.error.HTTPError as exc:
+            detail = exc.read().decode("utf-8", errors="ignore")
+            raise RuntimeError(detail or str(exc)) from exc
+        except urllib.error.URLError as exc:
+            raise RuntimeError(str(exc.reason)) from exc
+
+        finish_reason = "stop"
+        prompt_tokens = 0
+        completion_tokens = 0
+        started_at = time.perf_counter()
+
+        with resp:
+            for raw_line in resp:
+                line = raw_line.decode("utf-8", errors="ignore").strip()
+                if not line or not line.startswith("data:"):
+                    continue
+                data_str = line[len("data:"):].strip()
+                if data_str == "[DONE]":
+                    break
+                try:
+                    chunk = json.loads(data_str)
+                except json.JSONDecodeError:
+                    continue
+                choices = chunk.get("choices") or []
+                if not choices:
+                    continue
+                choice = choices[0]
+                delta = choice.get("delta") or {}
+                text_delta = delta.get("content")
+                if text_delta:
+                    yield StreamChunk(text=text_delta)
+                # vLLM emits ``finish_reason`` on the last delta only.
+                fr = choice.get("finish_reason")
+                if fr:
+                    finish_reason = str(fr)
+                usage = chunk.get("usage") or {}
+                if usage:
+                    prompt_tokens = int(usage.get("prompt_tokens") or prompt_tokens)
+                    completion_tokens = int(usage.get("completion_tokens") or completion_tokens)
+
+        elapsed = max(time.perf_counter() - started_at, 1e-6)
+        yield StreamChunk(
+            done=True,
+            finish_reason=finish_reason,
+            prompt_tokens=prompt_tokens,
+            completion_tokens=completion_tokens,
+            total_tokens=prompt_tokens + completion_tokens,
+            tok_s=round(completion_tokens / elapsed, 1) if completion_tokens else 0.0,
+            runtime_note=self.loaded_model.runtimeNote if self.loaded_model else None,
+        )
diff --git a/tests/test_vllm_wsl_engine.py b/tests/test_vllm_wsl_engine.py
new file mode 100644
index 0000000..f516f70
--- /dev/null
+++ b/tests/test_vllm_wsl_engine.py
@@ -0,0 +1,282 @@
+"""Tests for the FU-056 Phase 8 follow-up: VllmWslEngine.
+
+Pinned at the unit level — we don't actually shell out to ``wsl --``
+or talk to a real vLLM server. ``subprocess.Popen`` + ``_http_json``
+are mocked so the engine's lifecycle (spawn → wait_for_server →
+generate → unload) gets exercised without a 90-second model load.
+
+Two layers:
+  - ``windows_path_to_wsl`` is a pure function and gets exhaustive
+    coverage against the path-flavour matrix.
+  - The engine class gets happy-path + Windows-only gate + missing-
+    capability rejection tests.
+"""
+
+from __future__ import annotations
+
+import sys
+import unittest
+from unittest.mock import MagicMock, patch
+
+from backend_service.inference.base import BackendCapabilities
+from backend_service.inference.vllm_wsl_engine import (
+    VllmWslEngine,
+    windows_path_to_wsl,
+)
+
+
+class WindowsPathToWslTests(unittest.TestCase):
+    """Pure path-translation matrix — no side effects."""
+
+    def test_translates_backslash_drive_letter_path(self):
+        self.assertEqual(
+            windows_path_to_wsl(r"C:\Users\Dan\AI_Models\Qwen3-7B"),
+            "/mnt/c/Users/Dan/AI_Models/Qwen3-7B",
+        )
+
+    def test_translates_forward_slash_drive_letter_path(self):
+        self.assertEqual(
+            windows_path_to_wsl("C:/Users/Dan/AI_Models/Qwen3-7B"),
+            "/mnt/c/Users/Dan/AI_Models/Qwen3-7B",
+        )
+
+    def test_lowercases_drive_letter(self):
+        # WSL expects ``/mnt/c/...`` not ``/mnt/C/...``.
+        self.assertEqual(
+            windows_path_to_wsl(r"D:\Models"),
+            "/mnt/d/Models",
+        )
+
+    def test_passes_through_existing_wsl_path(self):
+        self.assertEqual(
+            windows_path_to_wsl("/home/dan/models/Qwen3-7B"),
+            "/home/dan/models/Qwen3-7B",
+        )
+
+    def test_passes_through_hf_repo_id(self):
+        # vLLM accepts ``org/name`` directly and downloads to its
+        # HF cache. We mustn't mangle it into a path.
+        self.assertEqual(
+            windows_path_to_wsl("Qwen/Qwen3.5-7B"),
+            "Qwen/Qwen3.5-7B",
+        )
+
+    def test_passes_through_empty_string(self):
+        self.assertEqual(windows_path_to_wsl(""), "")
+
+    def test_passes_through_unc_path(self):
+        # UNC paths (\\\\server\\share) aren't translated — vLLM wouldn't
+        # load from them inside WSL anyway. Just don't crash.
+        unc = r"\\server\share\models"
+        self.assertEqual(windows_path_to_wsl(unc), unc)
+
+    def test_passes_through_relative_path(self):
+        # Relative paths have no drive letter; leave alone.
+        self.assertEqual(windows_path_to_wsl(r"models\Qwen3"), r"models\Qwen3")
+
+
+def _make_caps(*, wsl_vllm: bool = True, distro: str | None = "Ubuntu-24.04") -> BackendCapabilities:
+    """Build a capabilities snapshot with the WSL bridge in the
+    requested state. Default = "ready"."""
+    return BackendCapabilities(
+        pythonExecutable="/x/python",
+        mlxAvailable=False,
+        mlxLmAvailable=False,
+        mlxUsable=False,
+        wsl2Available=wsl_vllm,
+        wslDistroName=distro,
+        wslCudaAvailable=wsl_vllm,
+        wslVllmAvailable=wsl_vllm,
+        wslVllmVersion="0.6.3" if wsl_vllm else None,
+    )
+
+
+class VllmWslEngineGatesTests(unittest.TestCase):
+    """Pre-spawn validation: platform + capability checks should
+    raise *before* any subprocess is touched."""
+
+    def test_load_rejects_off_windows(self):
+        engine = VllmWslEngine(_make_caps(wsl_vllm=True))
+        with patch.object(sys, "platform", "linux"):
+            with self.assertRaises(RuntimeError) as ctx:
+                engine.load_model(
+                    model_ref="Qwen/Qwen3.5-7B",
+                    model_name="Qwen3.5-7B",
+                    canonical_repo="Qwen/Qwen3.5-7B",
+                    source="catalog",
+                    backend="vllm",
+                    path=None,
+                    runtime_target=None,
+                    cache_strategy="native",
+                    cache_bits=0,
+                    fp16_layers=0,
+                    fused_attention=False,
+                    fit_model_in_memory=True,
+                    context_tokens=8192,
+                )
+        self.assertIn("Windows-only", str(ctx.exception))
+
+    def test_load_rejects_when_wsl_vllm_missing(self):
+        engine = VllmWslEngine(_make_caps(wsl_vllm=False))
+        with patch.object(sys, "platform", "win32"):
+            with self.assertRaises(RuntimeError) as ctx:
+                engine.load_model(
+                    model_ref="Qwen/Qwen3.5-7B",
+                    model_name="Qwen3.5-7B",
+                    canonical_repo="Qwen/Qwen3.5-7B",
+                    source="catalog",
+                    backend="vllm",
+                    path=None,
+                    runtime_target=None,
+                    cache_strategy="native",
+                    cache_bits=0,
+                    fp16_layers=0,
+                    fused_attention=False,
+                    fit_model_in_memory=True,
+                    context_tokens=8192,
+                )
+        # The error points the user at the install panel rather than
+        # leaving them guessing what to do next.
+        self.assertIn("WSL2 vLLM bridge", str(ctx.exception))
+
+
+class VllmWslEngineCommandTests(unittest.TestCase):
+    """Argv composition — checks each flag is present and ordered
+    so the WSL command stays valid for the upstream parser."""
+
+    def test_build_command_includes_required_flags(self):
+        engine = VllmWslEngine(_make_caps())
+        command = engine._build_wsl_command(
+            model_arg="Qwen/Qwen3.5-7B",
+            port=8000,
+            max_model_len=8192,
+        )
+
+        # Prefix is the wsl entry-point + arg separator. Without ``--``
+        # the wsl CLI tries to interpret the rest as wsl options.
+        self.assertEqual(command[0], "wsl")
+        self.assertEqual(command[1], "--")
+
+        # The venv-bound Python invocation — relative to the WSL
+        # user's $HOME via the leading ~. ``wsl --`` expands the ~.
+        self.assertIn("~/.chaosengine/vllm-venv/bin/python", command)
+        self.assertIn("-m", command)
+        self.assertIn("vllm.entrypoints.openai.api_server", command)
+
+        # User-driven flags.
+        self.assertIn("--model", command)
+        self.assertIn("Qwen/Qwen3.5-7B", command)
+        self.assertIn("--port", command)
+        self.assertIn("8000", command)
+        self.assertIn("--max-model-len", command)
+        self.assertIn("8192", command)
+
+        # Safety: bound to loopback so the model isn't exposed to the
+        # LAN, and ``--trust-remote-code`` covers repos like Qwen3-VL.
+        self.assertIn("--host", command)
+        self.assertIn("127.0.0.1", command)
+        self.assertIn("--trust-remote-code", command)
+
+
+class VllmWslEngineLifecycleTests(unittest.TestCase):
+    """Happy-path lifecycle with the subprocess + HTTP probe mocked
+    out. We never actually shell out to wsl.exe."""
+
+    def test_load_spawns_subprocess_and_polls_health(self):
+        engine = VllmWslEngine(_make_caps())
+
+        # Fake the subprocess: poll() returns None (still running),
+        # PID is a known int. ``terminate`` / ``wait`` are mocked so
+        # ``unload_model`` doesn't hang.
+        fake_proc = MagicMock()
+        fake_proc.poll.return_value = None
+        fake_proc.pid = 4242
+        fake_proc.wait.return_value = 0
+
+        with patch.object(sys, "platform", "win32"):
+            with patch(
+                "backend_service.inference.vllm_wsl_engine.subprocess.Popen",
+                return_value=fake_proc,
+            ) as popen_mock:
+                with patch(
+                    "backend_service.inference.vllm_wsl_engine._http_json",
+                    return_value={},  # ``/health`` returns OK immediately
+                ) as http_mock:
+                    with patch(
+                        "backend_service.inference.vllm_wsl_engine._find_open_port",
+                        return_value=8765,
+                    ):
+                        info = engine.load_model(
+                            model_ref="Qwen/Qwen3.5-7B",
+                            model_name="Qwen3.5-7B",
+                            canonical_repo="Qwen/Qwen3.5-7B",
+                            source="catalog",
+                            backend="vllm",
+                            path=None,
+                            runtime_target=None,
+                            cache_strategy="native",
+                            cache_bits=0,
+                            fp16_layers=0,
+                            fused_attention=False,
+                            fit_model_in_memory=True,
+                            context_tokens=8192,
+                        )
+
+        # Subprocess was spawned exactly once with the WSL argv.
+        popen_mock.assert_called_once()
+        spawned_argv = popen_mock.call_args.args[0]
+        self.assertEqual(spawned_argv[0], "wsl")
+
+        # Health probe was hit.
+        http_mock.assert_called()
+
+        # Loaded info reflects the spawn.
+        self.assertEqual(info.engine, "vllm-wsl")
+        self.assertEqual(info.ref, "Qwen/Qwen3.5-7B")
+        self.assertEqual(engine.port, 8765)
+        self.assertEqual(engine.process_pid(), 4242)
+
+    def test_load_translates_windows_path_in_runtime_target(self):
+        engine = VllmWslEngine(_make_caps())
+
+        fake_proc = MagicMock()
+        fake_proc.poll.return_value = None
+        fake_proc.pid = 7
+
+        with patch.object(sys, "platform", "win32"):
+            with patch(
+                "backend_service.inference.vllm_wsl_engine.subprocess.Popen",
+                return_value=fake_proc,
+            ) as popen_mock:
+                with patch(
+                    "backend_service.inference.vllm_wsl_engine._http_json",
+                    return_value={},
+                ):
+                    with patch(
+                        "backend_service.inference.vllm_wsl_engine._find_open_port",
+                        return_value=9000,
+                    ):
+                        engine.load_model(
+                            model_ref="Qwen/Qwen3.5-7B",
+                            model_name="Qwen3.5-7B",
+                            canonical_repo=None,
+                            source="local",
+                            backend="vllm",
+                            path=r"C:\Users\Dan\AI_Models\Qwen3-7B",
+                            runtime_target=None,
+                            cache_strategy="native",
+                            cache_bits=0,
+                            fp16_layers=0,
+                            fused_attention=False,
+                            fit_model_in_memory=True,
+                            context_tokens=8192,
+                        )
+
+        spawned_argv = popen_mock.call_args.args[0]
+        # The model arg should have been translated into the WSL
+        # /mnt/c/... form so vLLM can find the weights from inside WSL.
+        self.assertIn("/mnt/c/Users/Dan/AI_Models/Qwen3-7B", spawned_argv)
+
+
+if __name__ == "__main__":
+    unittest.main()

From 31acafe95cb1361841de77eed1296aea09fb0e21 Mon Sep 17 00:00:00 2001
From: Cryptopoly <31970407+cryptopoly@users.noreply.github.com>
Date: Sun, 17 May 2026 18:20:28 +0100
Subject: [PATCH 10/15] fix: vllm-wsl install preflight detects missing
 python3-venv (FU-056)

Caught during a live end-to-end test on a fresh Ubuntu 24.04 WSL
install: ``python3 -m venv ~/.chaosengine/vllm-venv`` fails with
``ensurepip is not available`` because Ubuntu 24.04 ships python3
without the venv module. Before this commit the user would see a
confusing error mid-install ("Failed to create the WSL venv. See
output above.") with the real fix buried in stderr.

Now the preflight step explicitly probes ``python3 -c 'import
ensurepip'`` after the CUDA check. When it fails, the install
endpoint surfaces the exact apt command:

    sudo apt update && sudo apt install -y python3-venv

instead of trying to create the venv and erroring out. Same
pattern as the existing NVIDIA-driver-not-found path: tell the
user what to do, don't pretend to recover.
---
 backend_service/routes/setup/vllm_wsl.py | 48 +++++++++++++++++++++---
 1 file changed, 42 insertions(+), 6 deletions(-)

diff --git a/backend_service/routes/setup/vllm_wsl.py b/backend_service/routes/setup/vllm_wsl.py
index 127c09f..c6e7517 100644
--- a/backend_service/routes/setup/vllm_wsl.py
+++ b/backend_service/routes/setup/vllm_wsl.py
@@ -173,17 +173,21 @@ def _job_worker() -> None:
     job.package_current = _PHASE_LABELS["preflight"]
     job.target_dir = _WSL_VLLM_VENV_PATH
 
-    # Step 1 — preflight. Confirm WSL responds + CUDA passthrough works
-    # before paying for the venv + pip download. Fails fast if the user
-    # tried to install on a box where ``nvidia-smi -L`` doesn't work
-    # inside WSL (the NVIDIA WSL driver kicker hasn't been installed
-    # on the Windows host).
+    # Step 1 — preflight. Two checks bundled into one attempt row so
+    # the user sees a single "checking prerequisites" step rather than
+    # a wall of green ticks for sub-probes:
+    #   (a) CUDA passthrough works (``nvidia-smi -L`` exits 0).
+    #   (b) python3-venv is available — Ubuntu 24.04 ships python3
+    #       without ``ensurepip``, so ``python3 -m venv X`` fails with
+    #       "ensurepip is not available" until ``python3.12-venv`` is
+    #       apt-installed. We surface that clearly because the fix is
+    #       a one-line sudo command outside our process boundary.
     code, output = _run_wsl_step(
         "nvidia-smi -L",
         _STEP_TIMEOUTS_SEC["preflight"],
     )
-    _push_attempt(job, "preflight", ok=(code == 0), output=output)
     if code != 0:
+        _push_attempt(job, "preflight", ok=False, output=output)
         job.phase = "error"
         job.error = (
             "CUDA isn't reachable inside WSL. Install the NVIDIA WSL "
@@ -193,6 +197,38 @@ def _job_worker() -> None:
         job.finished_at = time.time()
         job.done = True
         return
+
+    # python3-venv probe. ``python3 -c 'import ensurepip'`` exits 0 iff
+    # ensurepip is wired (i.e. python3-venv is installed). Cheaper than
+    # actually trying ``python3 -m venv /tmp/x`` and matching stderr.
+    code, venv_output = _run_wsl_step(
+        "python3 -c 'import ensurepip' 2>&1",
+        _STEP_TIMEOUTS_SEC["preflight"],
+    )
+    if code != 0:
+        _push_attempt(
+            job,
+            "preflight",
+            ok=False,
+            output=f"{output}\n\npython3-venv probe:\n{venv_output}",
+        )
+        job.phase = "error"
+        job.error = (
+            "python3-venv isn't installed in WSL. Open a WSL shell and run:\n"
+            "    sudo apt update && sudo apt install -y python3-venv\n"
+            "then retry this installer."
+        )
+        job.message = job.error
+        job.finished_at = time.time()
+        job.done = True
+        return
+
+    _push_attempt(
+        job,
+        "preflight",
+        ok=True,
+        output=f"{output}\n\npython3-venv: OK",
+    )
     _advance(job, 1)
 
     # Step 2 — venv. ``python3 -m venv`` is idempotent: if the dir

From c4f370153f010af2ef7febc0bec7f6c97a249d3d Mon Sep 17 00:00:00 2001
From: Cryptopoly <31970407+cryptopoly@users.noreply.github.com>
Date: Sun, 17 May 2026 18:49:06 +0100
Subject: [PATCH 11/15] =?UTF-8?q?fix:=20vLLM=20WSL=20bridge=20engine=20?=
 =?UTF-8?q?=E2=80=94=204=20issues=20caught=20by=20live=20e2e=20(FU-056=20P?=
 =?UTF-8?q?hase=208)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

End-to-end test validated against real CUDA + real vLLM 0.21.0 +
real WSL2 Ubuntu-24.04 on Windows + RTX 4090. Loaded
Qwen2.5-0.5B-Instruct in 96 s and generated "Paris." for the
prompt "The capital of France is" — 1.19 s HTTP round-trip from
the Windows backend into WSL and back.

Four fixes the live test surfaced, none of which would have been
caught by mocked unit tests:

1. **PATH plumbing through grandchild processes**: the engine
   subprocess inside vLLM (EngineCore) couldn't find ``ninja`` for
   flashinfer's JIT-compiled sampling kernels, even though it lived
   in the venv's bin/. The command builder now wraps the python
   invocation in ``bash -c`` so we can prepend
   ``~/.chaosengine/vllm-venv/bin`` to PATH explicitly. The PATH
   value is double-quoted because WSL2 interopts the Windows PATH
   into bash, and that PATH contains paths with spaces
   (``/mnt/c/Program Files/NVIDIA…``) which otherwise word-split
   into ``export: 'Files/NVIDIA': not a valid identifier`` errors.

2. **vLLM 0.21+ flashinfer JIT escape hatches**: even with ninja
   reachable, flashinfer needs ``nvcc`` for the second compile
   stage. Setting ``VLLM_USE_FLASHINFER_SAMPLER=0`` +
   ``VLLM_ATTENTION_BACKEND=TORCH_SDPA`` routes through pre-built
   PyTorch kernels. ``--enforce-eager`` disables CUDA-graph
   compilation. Loses some perf but avoids the second JIT.

3. **/v1/models probe instead of /health**: vLLM's ``/health``
   returns 200 with an empty body, which tripped ``_http_json``'s
   ``json.loads`` and made ``_wait_for_server`` retry indefinitely
   until the timeout. ``/v1/models`` returns the loaded-model list
   as JSON so the parse succeeds and we return on first OK.

4. **shlex-quoted model arg**: a model path with spaces (e.g. a
   Windows-translated ``/mnt/c/My Models/Qwen3-7B``) would
   word-split through the bash -c parse without quoting. New test
   pins the round-trip.

Plus the install endpoint's preflight already grew a clear
"sudo apt install python3-venv" message (last commit) — caught the
same way, just earlier in the chain.

New file ``scripts/live_e2e_vllm_wsl.py`` — not part of the
regular test suite; one-shot script that probes capabilities,
constructs the engine, loads a tiny chat-tuned model
(Qwen/Qwen2.5-0.5B-Instruct), generates a deterministic prompt,
prints metrics, tears down. Run from Windows + WSL with
vllm-venv installed: ``.venv\Scripts\python.exe scripts\live_e2e_vllm_wsl.py``.
Exit 0 on success, 1 with full traceback on failure.

Tests: 15 in test_vllm_wsl_engine.py still pass (3 lifecycle +
3 command-shape + 5 path-translation + 2 platform-gate +
2 capability-gate). All 42 in the wider WSL-bridge test files green.

Live-test run output:
    Loaded in 96.3s
    engine:        vllm-wsl
    runtimeNote:   vLLM 0.21.0 running inside WSL (Ubuntu-24.04).
    pid:           34036
    port:          58586
    text:          'Paris.'
    finishReason:  stop
    promptTokens:  34
    completionTokens: 3
    responseSeconds: 1.19
---
 backend_service/inference/vllm_wsl_engine.py | 109 +++++++++++------
 scripts/live_e2e_vllm_wsl.py                 | 121 +++++++++++++++++++
 tests/test_vllm_wsl_engine.py                |  91 ++++++++++----
 3 files changed, 256 insertions(+), 65 deletions(-)
 create mode 100644 scripts/live_e2e_vllm_wsl.py

diff --git a/backend_service/inference/vllm_wsl_engine.py b/backend_service/inference/vllm_wsl_engine.py
index 0d70e3c..29dc8aa 100644
--- a/backend_service/inference/vllm_wsl_engine.py
+++ b/backend_service/inference/vllm_wsl_engine.py
@@ -31,6 +31,7 @@
 
 import json
 import re
+import shlex
 import subprocess
 import sys
 import tempfile
@@ -134,39 +135,62 @@ def _build_wsl_command(
         port: int,
         max_model_len: int,
     ) -> list[str]:
-        """Compose the ``wsl -- python -m vllm.entrypoints...`` argv.
-
-        Pulled out for tests + so the comment about each flag lives
-        next to the flag rather than buried in load_model.
+        """Compose the ``wsl -- bash -c "<inner cmd>"`` argv.
+
+        We wrap the python invocation in ``bash -c`` so we can prepend
+        the venv's ``bin/`` to ``PATH``. This matters because vLLM's
+        runtime path (notably flashinfer) JIT-builds CUDA sampling
+        kernels with ``ninja``, which lives at
+        ``~/.chaosengine/vllm-venv/bin/ninja`` after the install.
+        Without the venv bin on PATH the JIT build crashes with
+        ``FileNotFoundError: 'ninja'`` mid-startup — caught live on the
+        FU-056 Phase 8 follow-up live test against opt-125m.
+
+        Arguments are ``shlex.quote``-escaped so a model path with
+        spaces / quotes can't break the bash parse.
         """
-        return [
-            "wsl",
-            "--",
-            f"{_WSL_VLLM_VENV_PATH}/bin/python",
-            "-m",
-            "vllm.entrypoints.openai.api_server",
-            "--model",
-            model_arg,
-            # ``--host 127.0.0.1`` keeps vLLM listening only on the
-            # loopback — WSL2 mirrors loopback to the Windows host so
-            # the Windows backend reaches it without any port-forward
-            # ceremony, and we don't expose the model to the LAN.
-            "--host",
-            "127.0.0.1",
-            "--port",
-            str(port),
-            # ``--max-model-len`` is vLLM's name for the context window
-            # cap. Defaults to whatever the model card declares, which
-            # can be too large for available VRAM. We pass through the
-            # user-selected ``contextTokens`` so the launch settings
-            # actually take effect.
-            "--max-model-len",
-            str(max_model_len),
-            # Trust the model config without prompting. vLLM's default
-            # is False, which throws ``ValueError: trust_remote_code``
-            # for repos like Qwen3-VL that ship custom modeling code.
-            "--trust-remote-code",
-        ]
+        bin_path = f"{_WSL_VLLM_VENV_PATH}/bin"
+        # PATH assignment is double-quoted — WSL2 interopts the Windows
+        # PATH into bash, and that PATH contains paths with spaces
+        # (e.g. ``/mnt/c/Program Files/NVIDIA...``). Without the quotes
+        # bash word-splits the expanded ``$PATH`` and crashes with
+        # ``export: 'Files/NVIDIA': not a valid identifier``.
+        #
+        # Env-var bypasses for vLLM 0.21+ flashinfer JIT path that
+        # otherwise crashes mid-startup with ``FileNotFoundError:
+        # 'ninja'`` (the JIT subprocess can't see venv bin on PATH
+        # through vLLM's multiprocessing fork chain). The flags +
+        # env vars below disable flashinfer's runtime kernel
+        # compilation and route attention through the pre-built
+        # TORCH_SDPA path instead.
+        #
+        # ``--host 127.0.0.1`` keeps vLLM bound to loopback — WSL2
+        # mirrors loopback to the Windows host so the Windows backend
+        # reaches the listener without any port-forward setup, and the
+        # model never leaks to the LAN.
+        #
+        # ``--max-model-len`` is vLLM's context cap; the model card's
+        # default is often too large for available VRAM.
+        #
+        # ``--trust-remote-code`` covers repos like Qwen3-VL that ship
+        # custom modeling code; vLLM refuses to import those by default.
+        #
+        # ``--enforce-eager`` skips CUDA-graph compilation. Loses some
+        # perf on long generations but avoids the second JIT path that
+        # would otherwise need a system ``nvcc``.
+        inner = (
+            f'export PATH="{bin_path}:$PATH" && '
+            'export VLLM_USE_FLASHINFER_SAMPLER=0 && '
+            'export VLLM_ATTENTION_BACKEND=TORCH_SDPA && '
+            f"{bin_path}/python -m vllm.entrypoints.openai.api_server "
+            f"--model {shlex.quote(model_arg)} "
+            f"--host 127.0.0.1 "
+            f"--port {port} "
+            f"--max-model-len {max_model_len} "
+            f"--trust-remote-code "
+            f"--enforce-eager"
+        )
+        return ["wsl", "--", "bash", "-c", inner]
 
     def _cleanup_process(self) -> None:
         if self.process is not None and self.process.poll() is None:
@@ -203,12 +227,19 @@ def process_pid(self) -> int | None:
         return int(self.process.pid)
 
     def _wait_for_server(self) -> None:
-        """Poll ``/health`` until vLLM accepts requests, or the subprocess dies.
-
-        vLLM's startup is slow (~30-90 s for a 7B model on cold cache)
-        because it builds the CUDA graph + warms KV blocks. We give it
-        the standard llama-timeout budget and surface the captured log
-        if it dies before becoming ready.
+        """Poll ``/v1/models`` until vLLM accepts requests, or the subprocess dies.
+
+        vLLM's startup is slow (~30-90 s for a 7B model on cold cache,
+        ~10-30 s for tiny models) because it builds CUDA-graph caches
+        + warms KV blocks. We give it the standard llama-timeout
+        budget and surface the captured log if it dies before becoming
+        ready.
+
+        Probes ``/v1/models`` rather than ``/health`` — vLLM's
+        ``/health`` returns 200 with an empty body, which trips
+        ``_http_json``'s ``json.loads`` (caught live during the FU-056
+        Phase 8 follow-up test). ``/v1/models`` returns the loaded
+        model list as JSON so the JSON parse succeeds.
         """
         deadline = time.time() + DEFAULT_LLAMA_TIMEOUT_SECONDS
         last_error = "vLLM (WSL) did not become ready."
@@ -217,7 +248,7 @@ def _wait_for_server(self) -> None:
                 logs = _read_text_tail(self.log_path)
                 raise RuntimeError(logs or "vLLM (WSL) exited during startup.")
             try:
-                _http_json(self._server_url("/health"), timeout=2.0)
+                _http_json(self._server_url("/v1/models"), timeout=2.0)
                 return
             except Exception as exc:  # noqa: BLE001 — best-effort poll
                 last_error = str(exc)
diff --git a/scripts/live_e2e_vllm_wsl.py b/scripts/live_e2e_vllm_wsl.py
new file mode 100644
index 0000000..7e0e999
--- /dev/null
+++ b/scripts/live_e2e_vllm_wsl.py
@@ -0,0 +1,121 @@
+"""Live end-to-end test for the vLLM WSL bridge (FU-056 Phase 8).
+
+Spawns ``VllmWslEngine`` against a tiny model (facebook/opt-125m,
+~250 MB), waits for the server to come up, generates a single
+completion, prints the result, and tears down. Not part of the
+regular test suite — runs once to validate the bridge end to end
+with real vLLM + real CUDA + real WSL.
+
+Usage (run from the repo root, Windows + WSL with vllm-venv ready):
+    .venv\\Scripts\\python.exe scripts\\live_e2e_vllm_wsl.py
+
+Exit code 0 → bridge works, 1 → see stderr for the failure mode.
+"""
+
+from __future__ import annotations
+
+import sys
+import time
+import traceback
+
+from backend_service.inference.capabilities import _probe_native_backends
+from backend_service.inference.vllm_wsl_engine import VllmWslEngine
+
+
+def main() -> int:
+    print("=" * 60)
+    print("LIVE E2E: VllmWslEngine")
+    print("=" * 60)
+
+    # 1) Capabilities probe — bail loudly if WSL bridge isn't ready.
+    print("\n[1/5] Probing capabilities...")
+    caps = _probe_native_backends()
+    print(f"  wsl2Available:      {caps.wsl2Available}")
+    print(f"  wslDistroName:      {caps.wslDistroName}")
+    print(f"  wslCudaAvailable:   {caps.wslCudaAvailable}")
+    print(f"  wslVllmAvailable:   {caps.wslVllmAvailable}")
+    print(f"  wslVllmVersion:     {caps.wslVllmVersion}")
+    if not (caps.wsl2Available and caps.wslCudaAvailable and caps.wslVllmAvailable):
+        print("\nBridge not ready — bail.", file=sys.stderr)
+        return 1
+
+    # 2) Construct the engine.
+    print("\n[2/5] Constructing VllmWslEngine...")
+    engine = VllmWslEngine(caps)
+
+    # 3) Load a tiny chat-tuned model. ``Qwen/Qwen2.5-0.5B-Instruct``
+    #    is 0.5B params, ~1 GB on disk, vLLM-compatible AND ships a
+    #    chat template (OPT-125m doesn't — caught live on take 4).
+    #    Downloads + loads in 1-3 min from cold cache.
+    test_model = "Qwen/Qwen2.5-0.5B-Instruct"
+    print(f"\n[3/5] Loading {test_model} through the WSL bridge...")
+    print("      (vLLM cold-start: 30-90 s for graph build + CUDA warmup)")
+    start = time.perf_counter()
+    try:
+        info = engine.load_model(
+            model_ref=test_model,
+            model_name="Qwen2.5-0.5B-Instruct",
+            canonical_repo=test_model,
+            source="catalog",
+            backend="vllm",
+            path=None,
+            runtime_target=None,
+            cache_strategy="native",
+            cache_bits=0,
+            fp16_layers=0,
+            fused_attention=False,
+            fit_model_in_memory=True,
+            context_tokens=2048,
+        )
+    except Exception:  # noqa: BLE001 — print the full trace for triage
+        print("\nLOAD FAILED:", file=sys.stderr)
+        traceback.print_exc()
+        return 1
+    load_elapsed = time.perf_counter() - start
+    print(f"  Loaded in {load_elapsed:.1f}s")
+    print(f"  engine:        {info.engine}")
+    print(f"  ref:           {info.ref}")
+    print(f"  runtimeTarget: {info.runtimeTarget}")
+    print(f"  runtimeNote:   {info.runtimeNote}")
+    print(f"  pid:           {engine.process_pid()}")
+    print(f"  port:          {engine.port}")
+
+    # 4) Generate a small completion.
+    print("\n[4/5] Generating: 'The capital of France is'")
+    try:
+        result = engine.generate(
+            prompt="The capital of France is",
+            history=[],
+            system_prompt=None,
+            max_tokens=20,
+            temperature=0.0,
+        )
+    except Exception:  # noqa: BLE001
+        print("\nGENERATE FAILED:", file=sys.stderr)
+        traceback.print_exc()
+        try:
+            engine.unload_model()
+        except Exception:  # noqa: BLE001
+            pass
+        return 1
+
+    print(f"  text:             {result.text!r}")
+    print(f"  finishReason:     {result.finishReason}")
+    print(f"  promptTokens:     {result.promptTokens}")
+    print(f"  completionTokens: {result.completionTokens}")
+    print(f"  tokS:             {result.tokS}")
+    print(f"  responseSeconds:  {result.responseSeconds}")
+
+    # 5) Clean up.
+    print("\n[5/5] Unloading + terminating WSL subprocess...")
+    engine.unload_model()
+    print("  Done.")
+
+    print("\n" + "=" * 60)
+    print("LIVE E2E: SUCCESS")
+    print("=" * 60)
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/tests/test_vllm_wsl_engine.py b/tests/test_vllm_wsl_engine.py
index f516f70..460053d 100644
--- a/tests/test_vllm_wsl_engine.py
+++ b/tests/test_vllm_wsl_engine.py
@@ -141,10 +141,12 @@ def test_load_rejects_when_wsl_vllm_missing(self):
 
 
 class VllmWslEngineCommandTests(unittest.TestCase):
-    """Argv composition — checks each flag is present and ordered
-    so the WSL command stays valid for the upstream parser."""
+    """Argv composition — the command wraps a bash -c invocation so
+    the venv's bin/ can be prepended to PATH (vLLM's flashinfer JIT
+    needs ninja, which lives in the venv bin). Checks both the wsl
+    wrapper shape AND the inner bash command content."""
 
-    def test_build_command_includes_required_flags(self):
+    def test_build_command_wraps_in_bash_c(self):
         engine = VllmWslEngine(_make_caps())
         command = engine._build_wsl_command(
             model_arg="Qwen/Qwen3.5-7B",
@@ -152,30 +154,62 @@ def test_build_command_includes_required_flags(self):
             max_model_len=8192,
         )
 
-        # Prefix is the wsl entry-point + arg separator. Without ``--``
-        # the wsl CLI tries to interpret the rest as wsl options.
+        # Outer shape: ``wsl -- bash -c "<inner>"``. The ``--`` is
+        # essential — without it ``wsl`` tries to interpret subsequent
+        # tokens as its own options.
         self.assertEqual(command[0], "wsl")
         self.assertEqual(command[1], "--")
+        self.assertEqual(command[2], "bash")
+        self.assertEqual(command[3], "-c")
+        self.assertEqual(len(command), 5, "expected wsl -- bash -c <inner> shape")
 
-        # The venv-bound Python invocation — relative to the WSL
-        # user's $HOME via the leading ~. ``wsl --`` expands the ~.
-        self.assertIn("~/.chaosengine/vllm-venv/bin/python", command)
-        self.assertIn("-m", command)
-        self.assertIn("vllm.entrypoints.openai.api_server", command)
+    def test_inner_command_exports_path_and_runs_vllm(self):
+        engine = VllmWslEngine(_make_caps())
+        command = engine._build_wsl_command(
+            model_arg="Qwen/Qwen3.5-7B",
+            port=8000,
+            max_model_len=8192,
+        )
+        inner = command[4]
 
-        # User-driven flags.
-        self.assertIn("--model", command)
-        self.assertIn("Qwen/Qwen3.5-7B", command)
-        self.assertIn("--port", command)
-        self.assertIn("8000", command)
-        self.assertIn("--max-model-len", command)
-        self.assertIn("8192", command)
+        # PATH prefix: venv bin first, then existing PATH. Without
+        # this prefix the FlashInfer JIT can't find ``ninja`` and
+        # vLLM crashes mid-startup. The PATH value is double-quoted
+        # because the inherited $PATH on WSL contains Windows paths
+        # with spaces (``/mnt/c/Program Files/...``) which would
+        # otherwise word-split.
+        self.assertIn('export PATH="~/.chaosengine/vllm-venv/bin:$PATH"', inner)
 
-        # Safety: bound to loopback so the model isn't exposed to the
-        # LAN, and ``--trust-remote-code`` covers repos like Qwen3-VL.
-        self.assertIn("--host", command)
-        self.assertIn("127.0.0.1", command)
-        self.assertIn("--trust-remote-code", command)
+        # The vLLM OpenAI server invocation.
+        self.assertIn("~/.chaosengine/vllm-venv/bin/python", inner)
+        self.assertIn("-m vllm.entrypoints.openai.api_server", inner)
+
+        # User-driven flags.
+        self.assertIn("--model Qwen/Qwen3.5-7B", inner)
+        self.assertIn("--port 8000", inner)
+        self.assertIn("--max-model-len 8192", inner)
+        self.assertIn("--host 127.0.0.1", inner)
+        self.assertIn("--trust-remote-code", inner)
+        # vLLM 0.21+ flashinfer JIT bypasses + eager mode (avoid CUDA
+        # graph JIT). See engine source for the rationale.
+        self.assertIn("VLLM_USE_FLASHINFER_SAMPLER=0", inner)
+        self.assertIn("VLLM_ATTENTION_BACKEND=TORCH_SDPA", inner)
+        self.assertIn("--enforce-eager", inner)
+
+    def test_inner_command_quotes_model_arg_with_special_chars(self):
+        # A model path with a space (rare but possible — e.g.
+        # ``/mnt/c/My Models/Qwen3-7B``) must survive the shell parse.
+        engine = VllmWslEngine(_make_caps())
+        command = engine._build_wsl_command(
+            model_arg="/mnt/c/My Models/Qwen3-7B",
+            port=8000,
+            max_model_len=8192,
+        )
+        inner = command[4]
+        # ``shlex.quote`` either single-quotes the whole thing or
+        # escapes the space; either way, the original path must
+        # round-trip through the shell parse.
+        self.assertIn("'/mnt/c/My Models/Qwen3-7B'", inner)
 
 
 class VllmWslEngineLifecycleTests(unittest.TestCase):
@@ -222,10 +256,12 @@ def test_load_spawns_subprocess_and_polls_health(self):
                             context_tokens=8192,
                         )
 
-        # Subprocess was spawned exactly once with the WSL argv.
+        # Subprocess was spawned exactly once with the WSL argv shape.
         popen_mock.assert_called_once()
         spawned_argv = popen_mock.call_args.args[0]
         self.assertEqual(spawned_argv[0], "wsl")
+        self.assertEqual(spawned_argv[2], "bash")
+        self.assertEqual(spawned_argv[3], "-c")
 
         # Health probe was hit.
         http_mock.assert_called()
@@ -273,9 +309,12 @@ def test_load_translates_windows_path_in_runtime_target(self):
                         )
 
         spawned_argv = popen_mock.call_args.args[0]
-        # The model arg should have been translated into the WSL
-        # /mnt/c/... form so vLLM can find the weights from inside WSL.
-        self.assertIn("/mnt/c/Users/Dan/AI_Models/Qwen3-7B", spawned_argv)
+        # The model arg lives inside the bash ``-c`` inner command
+        # (argv[4]) since FU-056 Phase 8 follow-up moved to a wrapped
+        # invocation. Translation to /mnt/c/... must round-trip into
+        # the inner command verbatim so vLLM finds the weights.
+        inner = spawned_argv[4]
+        self.assertIn("/mnt/c/Users/Dan/AI_Models/Qwen3-7B", inner)
 
 
 if __name__ == "__main__":

From a6ac4d0349826b83b485fcef46a101dcebc878f8 Mon Sep 17 00:00:00 2001
From: Cryptopoly <31970407+cryptopoly@users.noreply.github.com>
Date: Sun, 17 May 2026 18:53:58 +0100
Subject: [PATCH 12/15] test: FU-056 test-coverage backstop (Phase 9)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Closes the test-coverage gap on everything FU-056 has shipped over
the previous eight commits. Three small additions across the
existing test-gate scripts:

1. **scripts/cache-strategy-matrix.py** — capability probe now
   considers the WSL vLLM bridge a valid vllm provider. Without
   this, all four vllm matrix cells would skip with "vLLM not
   installed (CUDA-only)" on Windows boxes even though the bridge
   route works (validated by the live e2e in commit c4f3701).
   New ``wsl_vllm_available`` field on BackendCapabilities; the
   skip-reason copy now names both routes so a user reading a
   skip-row knows their actionable next step regardless of OS.

2. **scripts/pre-build-check.mjs [5/8]** — extended with a new
   sub-probe that walks ``src/components/acceleratorCatalog.ts``
   for every (pipPackage, capabilityField) pair and asserts each
   one exists in (a) the backend's _INSTALLABLE_PIP_PACKAGES
   allow-list and (b) the BackendCapabilities dataclass.

   Surface: ``PASS Accelerator catalog ↔ backend (6 entries)``.
   Catches drift: adding a 7th catalog row without wiring its pip
   package + capability flag would fail the gate at build time
   rather than at first user click. Six entries today (nunchaku,
   sageattention, dflash-mlx, dflash-cuda, triattention, kvpress).

3. **scripts/e2e_test_suite.py phase 6** — two new read-only
   probes alongside the existing 7:
   - ``vllm-wsl-status``: GETs /api/setup/install-vllm-wsl/status
     and asserts the JSON shape (phase + done fields present).
     Verifies the Phase 8 install endpoint at minimum returns the
     expected schema even when no install has been started.
   - ``fu-056-capability-flags``: GETs /api/health and asserts all
     7 FU-056 Phase 1 capability fields are present on
     ``nativeBackends``. The fields are optional in the schema
     (older backends shouldn't crash the frontend), but the gate
     ensures release builds expose them.

   Phase 6 grows from 7 to 9 checks. Verified live against the
   user's running backend: PASS 9/9.

No new test files. Phase 9 is gate plumbing on existing scripts.
---
 scripts/cache-strategy-matrix.py | 22 ++++++++++-
 scripts/e2e_test_suite.py        | 49 +++++++++++++++++++++++++
 scripts/pre-build-check.mjs      | 63 +++++++++++++++++++++++++++++++-
 3 files changed, 130 insertions(+), 4 deletions(-)

diff --git a/scripts/cache-strategy-matrix.py b/scripts/cache-strategy-matrix.py
index a0f1ab0..6014cf9 100755
--- a/scripts/cache-strategy-matrix.py
+++ b/scripts/cache-strategy-matrix.py
@@ -199,6 +199,12 @@ class BackendCapabilities:
     vllm_available: bool
     has_turbo_binary: bool
     library_refs: set[str]
+    # FU-056 Phase 9: vLLM-via-WSL bridge availability. On Windows boxes
+    # native vLLM never works (no Windows wheels), but the WSL bridge
+    # gives the same engine class through a subprocess. The matrix
+    # runner treats either as "vllm cells can run" so a Windows + RTX
+    # box isn't permanently locked out of the vLLM lane.
+    wsl_vllm_available: bool = False
 
 
 def probe_backend(port: int) -> BackendCapabilities:
@@ -225,7 +231,15 @@ def probe_backend(port: int) -> BackendCapabilities:
         ddtree_available=bool(dflash.get("ddtreeAvailable")),
         mtplx_available=bool(native_backends.get("mtplxAvailable")),
         gguf_mtp_available=bool(native_backends.get("ggufMtpAvailable")),
-        vllm_available=bool(native_backends.get("vllmAvailable")),
+        # ``vllmAvailable`` (native) OR ``wslVllmAvailable`` (Windows
+        # bridge) — either route can serve the vllm cells. The runner
+        # doesn't care which path the backend chose; it cares whether
+        # a vllm load will succeed at all.
+        vllm_available=(
+            bool(native_backends.get("vllmAvailable"))
+            or bool(native_backends.get("wslVllmAvailable"))
+        ),
+        wsl_vllm_available=bool(native_backends.get("wslVllmAvailable")),
         has_turbo_binary=bool(system.get("llamaServerTurboPath")),
         library_refs=refs,
     )
@@ -236,7 +250,11 @@ def skip_reason(cell: MatrixCell, caps: BackendCapabilities, *, quick: bool) ->
         return "deferred to full run (drop --quick)"
 
     if cell.backend == "vllm" and not caps.vllm_available:
-        return "vLLM not installed (CUDA-only)"
+        # ``vllm_available`` already considers the WSL bridge (FU-056
+        # Phase 8) — if neither route serves vLLM, the skip reason
+        # depends on the platform so the user gets the right next step.
+        # The runner doesn't know the OS, so name both paths.
+        return "vLLM not available (install via Diagnostics → WSL2 vLLM bridge on Windows, or pip install vllm on Linux+CUDA)"
 
     canonical = {"chaosengine": "turboquant", "rotorquant": "turboquant"}.get(
         cell.strategy, cell.strategy,
diff --git a/scripts/e2e_test_suite.py b/scripts/e2e_test_suite.py
index 5c302f1..99c3c1b 100755
--- a/scripts/e2e_test_suite.py
+++ b/scripts/e2e_test_suite.py
@@ -701,6 +701,55 @@ def _probe(_cmd=cmd):
             return "pass", "", {"keys": sorted(payload.keys())[:8] if isinstance(payload, dict) else None}
         phase.checks.append(_check(name, _probe))
 
+    # FU-056 Phase 9: probe the new install-vllm-wsl status endpoint
+    # + the seven Phase 1 capability flags that the install panels
+    # gate on. The status endpoint is read-only (POST starts a job;
+    # GET returns the most-recent state, defaulting to ``idle``) so
+    # it's safe in this read-only phase.
+    def _vllm_wsl_status():
+        try:
+            with urllib.request.urlopen(
+                f"http://{_HOST}:{_PORT}/api/setup/install-vllm-wsl/status",
+                timeout=10.0,
+            ) as resp:
+                payload = json.loads(resp.read())
+        except Exception as exc:  # noqa: BLE001
+            return "fail", f"vllm-wsl status fetch failed: {exc}", {}
+        if not isinstance(payload, dict) or "phase" not in payload:
+            return "fail", "vllm-wsl status payload missing 'phase'", {}
+        return "pass", "", {"phase": payload.get("phase"), "done": payload.get("done")}
+
+    def _accelerator_flags():
+        try:
+            with urllib.request.urlopen(
+                f"http://{_HOST}:{_PORT}/api/health",
+                timeout=10.0,
+            ) as resp:
+                payload = json.loads(resp.read())
+        except Exception as exc:  # noqa: BLE001
+            return "fail", f"/api/health fetch failed: {exc}", {}
+        native = (payload or {}).get("nativeBackends") or {}
+        # The seven FU-056 Phase 1 flags + four Phase 8 WSL fields.
+        # Optional on the schema — older backends may not expose them.
+        # We don't assert any are True; we assert the keys are
+        # present so the frontend can read them without a fallback.
+        wanted = (
+            "nunchakuAvailable",
+            "sageattentionAvailable",
+            "dflashMlxAvailable",
+            "dflashCudaAvailable",
+            "triattentionAvailable",
+            "kvpressAvailable",
+            "wsl2Available",
+        )
+        missing = [k for k in wanted if k not in native]
+        if missing:
+            return "fail", f"nativeBackends missing FU-056 flags: {missing}", {}
+        return "pass", "", {"present_flags": len(wanted), "wsl2": native.get("wsl2Available")}
+
+    phase.checks.append(_check("vllm-wsl-status", _vllm_wsl_status))
+    phase.checks.append(_check("fu-056-capability-flags", _accelerator_flags))
+
     fails = [c for c in phase.checks if c.status == "fail"]
     phase.status = "fail" if fails else "pass"
     return phase
diff --git a/scripts/pre-build-check.mjs b/scripts/pre-build-check.mjs
index 9116465..e0a0a91 100755
--- a/scripts/pre-build-check.mjs
+++ b/scripts/pre-build-check.mjs
@@ -177,9 +177,9 @@ console.log("[4/8] Licence notices...");
 console.log();
 
 // ------------------------------------------------------------------
-// 5. Cache strategy validation
+// 5. Cache strategy + FU-056 accelerator catalog validation
 // ------------------------------------------------------------------
-console.log("[5/8] Cache strategy validation...");
+console.log("[5/8] Cache strategy + accelerator catalog validation...");
 {
   const probe = `
 from cache_compression import registry
@@ -216,6 +216,65 @@ print('OK')
     pass("Cache strategy validation");
   }
 }
+// FU-056 Phase 9: catalog ↔ backend invariant. Every accelerator the
+// frontend catalog promises must have (a) a matching pip-package
+// alias in the backend's _INSTALLABLE_PIP_PACKAGES allow-list, and
+// (b) a capability flag on BackendCapabilities so the UI can render
+// "Installed ✓" state. Probe walks the frontend TS source for the
+// catalog rows and cross-checks both surfaces from the backend side.
+{
+  const catalogPath = path.join(
+    REPO_ROOT,
+    "src",
+    "components",
+    "acceleratorCatalog.ts",
+  );
+  if (!existsSync(catalogPath)) {
+    fail(`acceleratorCatalog.ts missing at ${catalogPath}`);
+  } else {
+    const catalog = readFileSync(catalogPath, "utf8");
+    // Pull every (pipPackage, capabilityField) pair out of the catalog
+    // entries. The shape is "    pipPackage: \"X\"," — the TS parser
+    // is dependency-free here on purpose; the catalog file shape is
+    // stable and a regex catches drift just as well.
+    const pipMatches = [...catalog.matchAll(/pipPackage:\s*"([^"]+)"/g)].map(
+      (m) => m[1],
+    );
+    const capMatches = [...catalog.matchAll(/capabilityField:\s*"([^"]+)"/g)].map(
+      (m) => m[1],
+    );
+
+    const probe = `
+import sys
+from backend_service.routes.setup import _INSTALLABLE_PIP_PACKAGES
+from backend_service.inference.base import BackendCapabilities
+
+# Frontend catalog rows the pre-build script extracted from TS.
+PIP_PKGS = ${JSON.stringify(pipMatches)}
+CAP_FIELDS = ${JSON.stringify(capMatches)}
+
+missing_pip = [p for p in PIP_PKGS if p not in _INSTALLABLE_PIP_PACKAGES]
+caps_fields = set(BackendCapabilities.__dataclass_fields__.keys())
+missing_cap = [c for c in CAP_FIELDS if c not in caps_fields]
+
+if missing_pip:
+    print(f"INVALID: catalog pipPackage(s) missing from _INSTALLABLE_PIP_PACKAGES: {missing_pip}")
+if missing_cap:
+    print(f"INVALID: catalog capabilityField(s) missing from BackendCapabilities: {missing_cap}")
+if not (missing_pip or missing_cap):
+    print(f"OK: {len(PIP_PKGS)} accelerator(s), all wired")
+`.trim();
+    const result = capture(venvPython(), ["-c", probe]);
+    const out = `${result.stdout}\n${result.stderr}`;
+    if (out.includes("INVALID")) {
+      fail(`Accelerator catalog invariant: ${out.trim()}`);
+    } else if (!result.ok) {
+      fail(`Accelerator catalog probe crashed (exit ${result.code}): ${out.trim()}`);
+    } else {
+      pass(`Accelerator catalog ↔ backend (${pipMatches.length} entries)`);
+    }
+  }
+}
 console.log();
 
 // ------------------------------------------------------------------

From 1ef3c5660c91590997015fe93b2ad1d258913d29 Mon Sep 17 00:00:00 2001
From: Cryptopoly <31970407+cryptopoly@users.noreply.github.com>
Date: Sun, 17 May 2026 19:23:56 +0100
Subject: [PATCH 13/15] fix: backend honors --port/--host CLI args + WSL test
 scripts (FU-056)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Caught during the live WSL test sweep: ``backend_service.app.main()``
hard-coded ``port=DEFAULT_PORT`` in the ``uvicorn.run`` call and ignored
the ``--port`` flag the test scripts have been passing. Worked
historically because DEFAULT_PORT already reads ``CHAOSENGINE_PORT``
env, so test runs that set the env var got the right port — but
``python -m backend_service.app --port 8877`` silently bound 8876.

Now ``main()`` uses argparse with env-var fallbacks:

    --port  → $CHAOSENGINE_PORT → 8876
    --host  → $CHAOSENGINE_HOST → 127.0.0.1

CLI > env > default. Surfaces ``--help`` properly (the user can
discover the args). The existing env-var path keeps working for the
Tauri shell + headless test scripts that already set ``CHAOSENGINE_PORT``.

Three new helper scripts under ``scripts/`` for the WSL dev workflow:
- ``install_llama_server_wsl.sh`` — downloads the latest llama.cpp
  Linux release into ``~/.chaosengine/bin/`` for the WSL backend.
- ``run_backend_wsl.sh`` — launches the backend on port 8877 with
  auth disabled (env: ``CHAOSENGINE_REQUIRE_AUTH=0``), pointing at the
  WSL-side llama-server. Detached via nohup + disown.
- ``probe_backend_wsl.sh`` — diagnostic helper; runs the backend
  foreground for 3 s and surfaces import / bind errors.

WSL test sweep results (Ubuntu-24.04, RTX 4090, vllm-venv at 0.21.0):
- pytest tests/ — 1472 passed, 21 failed, 21 skipped
  (49 more passes than Windows — fewer platform-specific failures)
- e2e_test_suite.py --smoke — 6/0/0 PASS including the two new
  FU-056 Phase 9 phase-6 probes (vllm-wsl-status + capability flags)
- cache-strategy-matrix.py --quick — 0/0 ran, 15/15 skipped honestly
  (only ``native`` strategy in dev venv; no turbo binary, no dflash,
  no models in dev library — all skip reasons accurate)
---
 backend_service/app.py              | 24 ++++++++++-
 scripts/install_llama_server_wsl.sh | 63 +++++++++++++++++++++++++++++
 scripts/probe_backend_wsl.sh        | 11 +++++
 scripts/run_backend_wsl.sh          | 17 ++++++++
 4 files changed, 113 insertions(+), 2 deletions(-)
 create mode 100644 scripts/install_llama_server_wsl.sh
 create mode 100644 scripts/probe_backend_wsl.sh
 create mode 100644 scripts/run_backend_wsl.sh

diff --git a/backend_service/app.py b/backend_service/app.py
index dea890e..b031f2b 100644
--- a/backend_service/app.py
+++ b/backend_service/app.py
@@ -779,15 +779,35 @@ def _watcher():
 
 
 def main() -> None:
+    import argparse
     import uvicorn
 
+    # CLI args take precedence over env vars which take precedence over
+    # the hardcoded DEFAULT_*. Honoring CHAOSENGINE_PORT lines up with
+    # the env-var the e2e suite + cache-strategy matrix already read,
+    # so a parallel dev backend on port 8877 only needs one variable
+    # set across the whole test surface.
+    parser = argparse.ArgumentParser(description="ChaosEngineAI backend sidecar.")
+    parser.add_argument(
+        "--host",
+        default=os.environ.get("CHAOSENGINE_HOST", DEFAULT_HOST),
+        help="Bind address (default: $CHAOSENGINE_HOST or DEFAULT_HOST).",
+    )
+    parser.add_argument(
+        "--port",
+        type=int,
+        default=int(os.environ.get("CHAOSENGINE_PORT", str(DEFAULT_PORT))),
+        help="Bind port (default: $CHAOSENGINE_PORT or DEFAULT_PORT).",
+    )
+    args = parser.parse_args()
+
     # Watch for parent death so we don't orphan ourselves
     _watch_parent_and_exit()
 
     uvicorn.run(
         "backend_service.app:app",
-        host=DEFAULT_HOST,
-        port=DEFAULT_PORT,
+        host=args.host,
+        port=args.port,
         reload=False,
     )
 
diff --git a/scripts/install_llama_server_wsl.sh b/scripts/install_llama_server_wsl.sh
new file mode 100644
index 0000000..92ab77b
--- /dev/null
+++ b/scripts/install_llama_server_wsl.sh
@@ -0,0 +1,63 @@
+#!/usr/bin/env bash
+# One-shot installer: fetch latest llama.cpp Linux release into
+# ~/.chaosengine/bin so the WSL dev backend has a usable llama-server
+# binary.
+set -euo pipefail
+
+INSTALL_DIR="${HOME}/.chaosengine/bin"
+mkdir -p "$INSTALL_DIR"
+
+cat >/tmp/find_llamacpp.py <<'EOF'
+import json, sys, urllib.request
+url = "https://api.github.com/repos/ggml-org/llama.cpp/releases/latest"
+with urllib.request.urlopen(url) as r:
+    data = json.load(r)
+tag = data["tag_name"]
+# Match the plain ubuntu-x64 tar.gz (CPU build — vulkan/sycl/openvino/
+# rocm/cuda variants need their respective runtime; the dev runner only
+# needs llama-server's HTTP path, not GPU acceleration).
+target = None
+for a in data["assets"]:
+    n = a["name"].lower()
+    if n.startswith(f"llama-{tag.lower()}-bin-ubuntu-x64.tar.gz"):
+        target = a["browser_download_url"]
+        break
+if not target:
+    print(f"ERROR: no plain ubuntu-x64 asset in {tag}")
+    sys.exit(1)
+print(f"{tag} {target}")
+EOF
+
+RESULT=$(python3 /tmp/find_llamacpp.py)
+TAG=$(echo "$RESULT" | awk '{print $1}')
+URL=$(echo "$RESULT" | awk '{print $2}')
+echo "Downloading $TAG from $URL ..."
+
+TMPDIR=$(mktemp -d)
+trap 'rm -rf "$TMPDIR"' EXIT
+
+curl -fsSL "$URL" -o "$TMPDIR/llamacpp.tar.gz"
+tar -xzf "$TMPDIR/llamacpp.tar.gz" -C "$TMPDIR"
+
+# Find the llama-server binary in the extracted tree.
+SERVER_BIN=$(find "$TMPDIR" -name 'llama-server' -type f 2>/dev/null | head -1)
+if [ -z "$SERVER_BIN" ]; then
+    echo "llama-server binary not found in tarball"
+    find "$TMPDIR" -type f | head -20
+    exit 1
+fi
+cp "$SERVER_BIN" "$INSTALL_DIR/llama-server"
+chmod +x "$INSTALL_DIR/llama-server"
+echo "$TAG" > "$INSTALL_DIR/llama-server.version"
+
+# Bundle shared libraries the binary depends on. The Linux release
+# layout has libllama.so / libggml*.so alongside the binary.
+find "$TMPDIR" -name '*.so*' -exec cp {} "$INSTALL_DIR/" \; 2>/dev/null || true
+
+echo ""
+echo "Installed to: $INSTALL_DIR/llama-server"
+echo "Version:      $TAG"
+echo ""
+
+# Smoke-test the install.
+"$INSTALL_DIR/llama-server" --version 2>&1 | head -3 || true
diff --git a/scripts/probe_backend_wsl.sh b/scripts/probe_backend_wsl.sh
new file mode 100644
index 0000000..9f3e8a1
--- /dev/null
+++ b/scripts/probe_backend_wsl.sh
@@ -0,0 +1,11 @@
+#!/usr/bin/env bash
+# Quick probe: does backend_service.app import + can it bind a port?
+set -e
+cd /home/dan/ChaosEngineAI
+echo "=== importing backend_service.app ==="
+.venv/bin/python -c "import backend_service.app as a; print('OK, main:', callable(a.main))"
+echo "=== running main with --help ==="
+.venv/bin/python -m backend_service.app --help 2>&1 | head -10 || echo "main failed exit $?"
+echo "=== running for 3 seconds ==="
+timeout 3s .venv/bin/python -m backend_service.app --port 8877 2>&1 | head -30 || true
+echo "=== probe done ==="
diff --git a/scripts/run_backend_wsl.sh b/scripts/run_backend_wsl.sh
new file mode 100644
index 0000000..9b30496
--- /dev/null
+++ b/scripts/run_backend_wsl.sh
@@ -0,0 +1,17 @@
+#!/usr/bin/env bash
+# Launch the dev backend inside WSL on port 8877 with auth disabled
+# (test-runs don't need the bearer-token gate). Avoids the host's
+# 8876 (which the Windows-side ChaosEngineAI binds — WSL2 mirrors
+# loopback so the port collision is real).
+set -euo pipefail
+
+cd /home/dan/ChaosEngineAI
+export CHAOSENGINE_LLAMA_SERVER="$HOME/.chaosengine/bin/llama-server"
+export CHAOSENGINE_PORT=8877
+export CHAOSENGINE_HOST=127.0.0.1
+export CHAOSENGINE_REQUIRE_AUTH=0
+
+nohup .venv/bin/python -m backend_service.app \
+    > /tmp/backend_wsl.log 2>&1 &
+echo "PID=$!"
+disown

From 33e6dda5a93d7722384316e45d08bca20b481789 Mon Sep 17 00:00:00 2001
From: Cryptopoly <31970407+cryptopoly@users.noreply.github.com>
Date: Sun, 17 May 2026 20:02:57 +0100
Subject: [PATCH 14/15] feat: hide MTPLX on non-Apple-Silicon + chat
 empty-state banner (FU-056)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two related UX cleanups landed together because they share the same
plumbing pattern (App.tsx → tabs → leaf components):

1. **Hide MTPLX install affordances on Windows / Linux.** The MTPLX
   block in RuntimeControls (the launch-settings modal that opens
   from Chat / Compare / HTML Challenge / Benchmarks) used to render
   the MTPLX checkbox + "Install MTPLX" button + info disclosure on
   every host. MTPLX is Apple-Silicon-only — the install would error
   on Windows and the checkbox would render disabled with no path to
   recovery. Per the FU-034 rule (hide unrecoverable options, don't
   grey them out), the whole block is now gated on a new
   ``isAppleSilicon`` prop threaded from App.tsx via:
       App.tsx
         → LaunchModal / CompareView / HtmlChallengeTab /
           BenchmarkRunTab
         → ChallengePickerModal (for HtmlChallengeTab)
         → ModelLaunchModal
         → RuntimeControls
   Three call sites on RuntimeControls (the MTPLX label, the
   info-panel expand, the info button) now ALL gate on the prop.
   ``dflash-mlx`` was already platform-gated via the FU-056 Phase 2
   AcceleratorCard catalog (platformGate: "apple-silicon").

2. **Chat empty-state banner.** Fresh-install users opening the
   Chat tab used to see "Send a message to start the conversation."
   followed by a silent auto-load of the largest MLX direct variant
   (a 15+ GB download that doesn't even work on Windows/Linux —
   MLX backend doesn't exist there). Replaced with a
   ``<ChatEmptyStateBanner>`` that surfaces a clear CTA: "Browse
   Discover" when library is empty, "Open Models" when models are
   present but none loaded. No silent auto-loads, no confused users
   waiting on the wrong download.

   The banner is purely additive — composer textarea still usable
   above (users can also type + the banner suggests Discover).

Plumbing:
- New ``src/utils/platform.ts`` with ``isAppleSiliconHost``,
  ``isCudaHost``, ``isIntelMac`` helpers. Reads from
  ``workspace.system`` (platform + arch) which the backend already
  populates from ``platform.system()`` + ``platform.machine()``.
- 15 unit tests in ``src/utils/__tests__/platform.test.ts`` pin
  every host-classifier branch (Darwin arm64, aarch64, Intel Mac,
  Windows, Linux, null/undefined, case-insensitive).
- ``isAppleSiliconHost(workspace.system)`` computed once at App.tsx
  top-level, threaded as ``isAppleSilicon`` prop to the four call
  sites that own MTPLX surfaces.
- New ``<ChatEmptyStateBanner>`` component with two states
  (no-models / no-loaded-model), each with appropriate CTA.

Tests: 35 files / 424 vitest tests pass (+15 from platform helper).
tsc clean. No new pytest needed — backend unchanged.

Not addressed in this commit (deferred):
- MLX-only image / video catalog variants still surface in Discover
  / Models tabs on Win/Linux. Filtering those is a larger UX call —
  hide entirely vs. show with "Apple Silicon only" pill — deserves
  its own decision before code.
- "llama-server installed by default" — already the case via
  scripts/stage-runtime.mjs for release builds. No code change.
---
 src/App.tsx                                   | 14 ++++
 src/components/LaunchModal.tsx                |  4 +
 src/components/ModelLaunchModal.tsx           |  5 ++
 src/components/RuntimeControls.tsx            | 25 ++++--
 src/features/benchmarks/BenchmarkRunTab.tsx   |  5 ++
 src/features/chat/ChatEmptyStateBanner.tsx    | 84 +++++++++++++++++++
 src/features/chat/ChatTab.tsx                 | 14 ++++
 src/features/chat/ChatThread.tsx              | 30 ++++++-
 src/features/chat/CompareView.tsx             |  4 +
 src/features/chat/HtmlChallengeTab.tsx        |  4 +
 .../html_challenge/ChallengePickerModal.tsx   |  4 +
 src/styles.css                                | 32 +++++++
 src/utils/__tests__/platform.test.ts          | 78 +++++++++++++++++
 src/utils/index.ts                            |  1 +
 src/utils/platform.ts                         | 70 ++++++++++++++++
 15 files changed, 365 insertions(+), 9 deletions(-)
 create mode 100644 src/features/chat/ChatEmptyStateBanner.tsx
 create mode 100644 src/utils/__tests__/platform.test.ts
 create mode 100644 src/utils/platform.ts

diff --git a/src/App.tsx b/src/App.tsx
index f9e649a..cb0e05f 100644
--- a/src/App.tsx
+++ b/src/App.tsx
@@ -89,6 +89,7 @@ import {
   compareOptionalNumber,
   serverOriginFromBase,
   isUnsavedEmptySession,
+  isAppleSiliconHost,
 } from "./utils";
 import {
   useWorkspace,
@@ -649,6 +650,12 @@ export default function App() {
 
   // ── Cross-domain derived state ─────────────────────────────
   const nativeBackends = workspace.runtime.nativeBackends;
+  // FU-056 follow-up: derive once, thread to surfaces that gate
+  // Apple-Silicon-only affordances (MTPLX in launch settings, MLX-LM
+  // install panels, mlx-video install rows). Reads platform/arch from
+  // the system probe — falls to ``false`` on early paint before the
+  // probe lands, which is the safe default (don't flash MLX UI).
+  const isAppleSilicon = isAppleSiliconHost(workspace.system);
   const filteredLogs = workspace.logs.filter((entry) => {
     const haystack = `${entry.ts} ${entry.source} ${entry.level} ${entry.message}`.toLowerCase();
     return haystack.includes(logQuery.toLowerCase());
@@ -1610,6 +1617,9 @@ export default function App() {
         loadedModelName={workspace.runtime.loadedModel?.name ?? null}
         onInstallPackage={handleInstallPackage}
         installingPackage={installingPackage}
+        noChatModelsInstalled={libraryChatOptions.length === 0}
+        onBrowseDiscover={() => setActiveTab("online-models")}
+        onOpenModels={() => setActiveTab("my-models")}
       />
     );
   } else if (activeTab === "chat-compare") {
@@ -1628,6 +1638,7 @@ export default function App() {
         onInstallMtplx={() => void handleInstallMtplx()}
         installingMtplx={installingMtplx}
         mtplxJob={mtplxJob}
+        isAppleSilicon={isAppleSilicon}
         onInstallPackage={handleInstallPackage}
         installingPackage={installingPackage}
         installLogs={installLogs}
@@ -1648,6 +1659,7 @@ export default function App() {
         onInstallMtplx={() => void handleInstallMtplx()}
         installingMtplx={installingMtplx}
         mtplxJob={mtplxJob}
+        isAppleSilicon={isAppleSilicon}
         onInstallPackage={handleInstallPackage}
         installingPackage={installingPackage}
         installLogs={installLogs}
@@ -1728,6 +1740,7 @@ export default function App() {
         onInstallMtplx={() => void handleInstallMtplx()}
         installingMtplx={installingMtplx}
         mtplxJob={mtplxJob}
+        isAppleSilicon={isAppleSilicon}
         onBenchmarkDraftChange={updateBenchmarkDraft}
         onBenchmarkPromptIdChange={setBenchmarkPromptId}
         onBenchmarkModelKeyChange={setBenchmarkModelKey}
@@ -1981,6 +1994,7 @@ export default function App() {
         onInstallMtplx={() => void handleInstallMtplx()}
         installingMtplx={installingMtplx}
         mtplxJob={mtplxJob}
+        isAppleSilicon={isAppleSilicon}
         onPendingLaunchChange={setPendingLaunch}
         onLaunchModelSearchChange={setLaunchModelSearch}
         onLaunchSettingChange={updateLaunchSetting}
diff --git a/src/components/LaunchModal.tsx b/src/components/LaunchModal.tsx
index 131201b..cddfef1 100644
--- a/src/components/LaunchModal.tsx
+++ b/src/components/LaunchModal.tsx
@@ -28,6 +28,8 @@ export interface LaunchModalProps {
   onInstallMtplx?: () => void;
   installingMtplx?: boolean;
   mtplxJob?: MtplxJobState | null;
+  /** FU-056 follow-up: hide MTPLX block on non-Apple-Silicon hosts. */
+  isAppleSilicon?: boolean;
   onPendingLaunchChange: (value: PendingLaunch | null | ((prev: PendingLaunch | null) => PendingLaunch | null)) => void;
   onLaunchModelSearchChange: (value: string) => void;
   onLaunchSettingChange: <K extends keyof LaunchPreferences>(key: K, value: LaunchPreferences[K]) => void;
@@ -54,6 +56,7 @@ export function LaunchModal({
   onInstallMtplx,
   installingMtplx,
   mtplxJob,
+  isAppleSilicon = false,
   onPendingLaunchChange,
   onLaunchModelSearchChange,
   onLaunchSettingChange,
@@ -102,6 +105,7 @@ export function LaunchModal({
       onInstallMtplx={onInstallMtplx}
       installingMtplx={installingMtplx}
       mtplxJob={mtplxJob}
+      isAppleSilicon={isAppleSilicon}
       onSelectedKeyChange={setSelectedLaunchKey}
       onSearchChange={onLaunchModelSearchChange}
       onSettingChange={onLaunchSettingChange}
diff --git a/src/components/ModelLaunchModal.tsx b/src/components/ModelLaunchModal.tsx
index 7da1c69..d8dcd0e 100644
--- a/src/components/ModelLaunchModal.tsx
+++ b/src/components/ModelLaunchModal.tsx
@@ -70,6 +70,9 @@ export interface ModelLaunchModalProps {
   onInstallMtplx?: () => void;
   installingMtplx?: boolean;
   mtplxJob?: MtplxJobState | null;
+  /** FU-056 follow-up: forwarded to ``RuntimeControls`` so the MTPLX
+   * block hides on non-Apple-Silicon hosts where MTPLX can't run. */
+  isAppleSilicon?: boolean;
   onSelectedKeyChange: (key: string) => void;
   onSearchChange: (value: string) => void;
   onSettingChange: <K extends keyof LaunchPreferences>(key: K, value: LaunchPreferences[K]) => void;
@@ -100,6 +103,7 @@ export function ModelLaunchModal({
   onInstallMtplx,
   installingMtplx,
   mtplxJob,
+  isAppleSilicon = false,
   onSelectedKeyChange,
   onSearchChange,
   onSettingChange,
@@ -254,6 +258,7 @@ export function ModelLaunchModal({
               onInstallMtplx={onInstallMtplx}
               installingMtplx={installingMtplx}
               mtplxJob={mtplxJob}
+              isAppleSilicon={isAppleSilicon}
               compact
             />
           </div>
diff --git a/src/components/RuntimeControls.tsx b/src/components/RuntimeControls.tsx
index fb778f2..9adcd42 100644
--- a/src/components/RuntimeControls.tsx
+++ b/src/components/RuntimeControls.tsx
@@ -147,6 +147,12 @@ interface RuntimeControlsProps {
   onInstallMtplx?: () => void;
   installingMtplx?: boolean;
   mtplxJob?: MtplxJobState | null;
+  /** FU-056 follow-up: pass ``isAppleSilicon=true`` to surface the
+   * MTPLX block (Apple-Silicon-only). Defaults to ``false`` (hidden)
+   * on Windows / Linux where MTPLX can't run — the install button
+   * would error and the checkbox would render disabled with no path
+   * to recovery. Pass true on Darwin arm64 hosts only. */
+  isAppleSilicon?: boolean;
 }
 
 function StrategyInstallTerminal({
@@ -247,6 +253,7 @@ export function RuntimeControls({
   onInstallMtplx,
   installingMtplx,
   mtplxJob,
+  isAppleSilicon = false,
 }: RuntimeControlsProps) {
   const { t } = useTranslation("runtime");
   const effectiveMaxContext = Math.max(2048, maxContext ?? 262144);
@@ -752,13 +759,15 @@ export function RuntimeControls({
             <span className="slider-value">{settings.treeBudget ?? 0}</span>
           </div>
         ) : null}
-        {/* MTPLX: native in-model MTP speculative decoding. Hidden when the
-            model has no MTP heads (no install button helps that case). Shown
-            with an install button when the model is supported but the venv
-            is not yet installed. Uses the same speculativeDecoding field as
-            DFlash — the controller auto-routes to MtplxEngine when both the
-            model has MTP heads and the venv is installed. */}
-        {mtplxInfo?.modelSupported ? (
+        {/* MTPLX: native in-model MTP speculative decoding. Apple-Silicon-only —
+            the engine runs in an isolated venv at ~/.chaosengine/mtplx-venv
+            and requires the MLX framework which has no Linux/Windows build.
+            Hidden entirely off-platform (FU-056 follow-up): the install
+            button would error + the checkbox would render disabled with no
+            user-actionable recovery path. Apple Silicon hosts still need the
+            model itself to advertise MTP heads (``modelSupported``) — no
+            install button rescues a model without baked-in MTP heads. */}
+        {isAppleSilicon && mtplxInfo?.modelSupported ? (
           <div className="check-row">
             <label
               className="check-row"
@@ -812,7 +821,7 @@ export function RuntimeControls({
         {mtplxJob && mtplxJob.phase !== "idle" && mtplxJob.phase !== "done" ? (
           <InstallLogPanel job={mtplxJob} variant="mtplx" />
         ) : null}
-        {expandedInfo === "mtplx" && mtplxInfo?.modelSupported ? (
+        {isAppleSilicon && expandedInfo === "mtplx" && mtplxInfo?.modelSupported ? (
           <div className="cache-strategy-info-panel" style={{ marginTop: 4 }}>
             <p>
               {t("mtplx.body", {
diff --git a/src/features/benchmarks/BenchmarkRunTab.tsx b/src/features/benchmarks/BenchmarkRunTab.tsx
index 74395fc..d1c7f70 100644
--- a/src/features/benchmarks/BenchmarkRunTab.tsx
+++ b/src/features/benchmarks/BenchmarkRunTab.tsx
@@ -43,6 +43,9 @@ export interface BenchmarkRunTabProps {
   onInstallMtplx?: () => void;
   installingMtplx?: boolean;
   mtplxJob?: MtplxJobState | null;
+  /** FU-056 follow-up: forwarded to ``RuntimeControls`` so the MTPLX
+   * block hides on non-Apple-Silicon hosts. */
+  isAppleSilicon?: boolean;
   onBenchmarkDraftChange: <K extends keyof BenchmarkRunPayload>(key: K, value: BenchmarkRunPayload[K]) => void;
   onBenchmarkPromptIdChange: (id: string) => void;
   onBenchmarkModelKeyChange: (key: string) => void;
@@ -75,6 +78,7 @@ export function BenchmarkRunTab({
   onInstallMtplx,
   installingMtplx,
   mtplxJob,
+  isAppleSilicon = false,
   onBenchmarkDraftChange,
   onBenchmarkPromptIdChange,
   onBenchmarkModelKeyChange,
@@ -566,6 +570,7 @@ export function BenchmarkRunTab({
         onInstallMtplx={onInstallMtplx}
         installingMtplx={installingMtplx}
         mtplxJob={mtplxJob}
+        isAppleSilicon={isAppleSilicon}
         onSelectedKeyChange={(key) => {
           onBenchmarkModelKeyChange(key);
         }}
diff --git a/src/features/chat/ChatEmptyStateBanner.tsx b/src/features/chat/ChatEmptyStateBanner.tsx
new file mode 100644
index 0000000..dceca25
--- /dev/null
+++ b/src/features/chat/ChatEmptyStateBanner.tsx
@@ -0,0 +1,84 @@
+import { useTranslation } from "react-i18next";
+
+/**
+ * Empty-state CTA for the Chat tab on a fresh install (FU-056 follow-up).
+ *
+ * Renders inside the empty thread when the user opens Chat before
+ * downloading their first chat model. Instead of just "Send a message
+ * to start the conversation." (which silently auto-loads the largest
+ * MLX direct variant — broken on Windows + slow on Macs), this card
+ * points the user at the Discover tab where they can pick their own
+ * starter.
+ *
+ * Two states:
+ *   - No chat models in the library → "Browse Discover" CTA
+ *   - Models present but none loaded → "Load a model from Models →" hint
+ *
+ * Both states are non-blocking — the composer is still usable above,
+ * but sending without a model does nothing useful, so the card sits
+ * inside the thread where the empty conversation would otherwise be.
+ */
+
+export interface ChatEmptyStateBannerProps {
+  /** True when the library has zero chat-capable models. Drives the
+   * primary CTA (Discover for new users, Models for users who have
+   * downloaded but not loaded). */
+  noChatModelsInstalled: boolean;
+  /** Fired when the user clicks the primary CTA. The parent maps this
+   * to the appropriate tab change. */
+  onBrowseDiscover: () => void;
+  /** Fired when the user clicks "go to Models" (only shown when they
+   * already have at least one chat model). */
+  onOpenModels: () => void;
+}
+
+export function ChatEmptyStateBanner({
+  noChatModelsInstalled,
+  onBrowseDiscover,
+  onOpenModels,
+}: ChatEmptyStateBannerProps) {
+  const { t } = useTranslation("chat");
+
+  if (noChatModelsInstalled) {
+    return (
+      <div className="chat-empty-banner" role="region" aria-label="Get started with chat">
+        <h3 className="chat-empty-banner-title">
+          {t("emptyBanner.welcomeTitle", {
+            defaultValue: "👋 Welcome to ChaosEngineAI Chat",
+          })}
+        </h3>
+        <p className="chat-empty-banner-body">
+          {t("emptyBanner.welcomeBody", {
+            defaultValue:
+              "Pick a chat model from Discover to get started. We recommend a small Qwen3 or Llama 3 variant for your first run — they download in a minute or two and run on any laptop.",
+          })}
+        </p>
+        <div className="chat-empty-banner-actions">
+          <button
+            type="button"
+            className="primary-button"
+            onClick={onBrowseDiscover}
+          >
+            {t("emptyBanner.browseDiscover", { defaultValue: "Browse Discover" })}
+          </button>
+        </div>
+      </div>
+    );
+  }
+
+  return (
+    <div className="chat-empty-banner" role="region" aria-label="Load a model to chat">
+      <p className="chat-empty-banner-body">
+        {t("emptyBanner.noModelLoaded", {
+          defaultValue:
+            "No model is loaded yet. Pick one from Models to start chatting.",
+        })}
+      </p>
+      <div className="chat-empty-banner-actions">
+        <button type="button" className="secondary-button" onClick={onOpenModels}>
+          {t("emptyBanner.openModels", { defaultValue: "Open Models" })}
+        </button>
+      </div>
+    </div>
+  );
+}
diff --git a/src/features/chat/ChatTab.tsx b/src/features/chat/ChatTab.tsx
index 638e3ed..f83f476 100644
--- a/src/features/chat/ChatTab.tsx
+++ b/src/features/chat/ChatTab.tsx
@@ -121,6 +121,13 @@ export interface ChatTabProps {
   loadedModelName?: string | null;
   onInstallPackage?: (pipPackage: string) => void;
   installingPackage?: string | null;
+  /** FU-056 follow-up: empty-state banner pieces. Threads the
+   * "library has no chat models?" bit + the two tab-change handlers
+   * so the banner can point users at Discover (fresh install) or
+   * Models (have models, none loaded). */
+  noChatModelsInstalled?: boolean;
+  onBrowseDiscover?: () => void;
+  onOpenModels?: () => void;
 }
 
 // Avoid an unused-import diagnostic — ChatModelOption is still part of
@@ -181,6 +188,9 @@ export function ChatTab({
   loadedModelName,
   onInstallPackage,
   installingPackage,
+  noChatModelsInstalled = false,
+  onBrowseDiscover,
+  onOpenModels,
 }: ChatTabProps) {
   const { t } = useTranslation("chat");
   const modelBusyLabel =
@@ -443,6 +453,10 @@ export function ChatTab({
           onDetailsToggle={onDetailsToggle}
           onCancelGeneration={onCancelGeneration}
           onLoadModel={onLoadModel}
+          noChatModelsInstalled={noChatModelsInstalled}
+          loadedModelRef={loadedModelRef}
+          onBrowseDiscover={onBrowseDiscover}
+          onOpenModels={onOpenModels}
         />
         <ChatComposer
           draftMessage={draftMessage}
diff --git a/src/features/chat/ChatThread.tsx b/src/features/chat/ChatThread.tsx
index 2c5c33e..d0894ea 100644
--- a/src/features/chat/ChatThread.tsx
+++ b/src/features/chat/ChatThread.tsx
@@ -11,6 +11,7 @@ import { ChatPerfStrip } from "../../components/ChatPerfStrip";
 import { LogprobSummary } from "../../components/LogprobSummary";
 import { SubstrateRoutingBadge } from "../../components/SubstrateRoutingBadge";
 import { ToolCallCard } from "../../components/ToolCallCard";
+import { ChatEmptyStateBanner } from "./ChatEmptyStateBanner";
 import type { ChatSession, ChatMessageVariant, LaunchPreferences, ModelLoadingState, WarmModel } from "../../types";
 import { number } from "../../utils";
 import { VariantPickerButton } from "./VariantPickerButton";
@@ -42,6 +43,15 @@ export interface ChatThreadProps {
   engineLabel: string;
   launchSettings: LaunchPreferences;
   busy: boolean;
+  /** FU-056 follow-up: when true, the empty-state CTA points the
+   * user at Discover. When false (models present), it points at
+   * Models. Always renders inside the empty thread when there are
+   * no messages — so users on a fresh install never see a blank
+   * Chat tab with no path forward. */
+  noChatModelsInstalled?: boolean;
+  loadedModelRef?: string | null;
+  onBrowseDiscover?: () => void;
+  onOpenModels?: () => void;
   onChatFileDrop: (files: FileList) => void;
   onCopyMessage: (text: string) => void;
   onRetryMessage: (index: number) => void;
@@ -94,6 +104,10 @@ export function ChatThread({
   onDetailsToggle,
   onCancelGeneration,
   onLoadModel,
+  noChatModelsInstalled = false,
+  loadedModelRef = null,
+  onBrowseDiscover,
+  onOpenModels,
 }: ChatThreadProps) {
   const { t } = useTranslation("chat");
   return (
@@ -460,7 +474,21 @@ export function ChatThread({
         })
       ) : (
         <div className="empty-state">
-          <p>{t("thread.emptyState", { defaultValue: "Send a message to start the conversation." })}</p>
+          {/* FU-056 follow-up: redirect users with no installed chat
+              model to Discover, and users with no loaded model to
+              Models. The auto-load-largest-MLX-variant behaviour was
+              both confusing on Apple Silicon (15+ GB silent download)
+              and broken on Windows/Linux (MLX backend doesn't exist
+              there). Banner stays visible until a model is loaded. */}
+          {!loadedModelRef && onBrowseDiscover && onOpenModels ? (
+            <ChatEmptyStateBanner
+              noChatModelsInstalled={noChatModelsInstalled}
+              onBrowseDiscover={onBrowseDiscover}
+              onOpenModels={onOpenModels}
+            />
+          ) : (
+            <p>{t("thread.emptyState", { defaultValue: "Send a message to start the conversation." })}</p>
+          )}
         </div>
       )}
       {serverLoading ? (
diff --git a/src/features/chat/CompareView.tsx b/src/features/chat/CompareView.tsx
index e5d2e93..39338a7 100644
--- a/src/features/chat/CompareView.tsx
+++ b/src/features/chat/CompareView.tsx
@@ -73,6 +73,8 @@ interface CompareViewProps {
   onInstallMtplx?: () => void;
   installingMtplx?: boolean;
   mtplxJob?: MtplxJobState | null;
+  /** FU-056 follow-up: hide MTPLX block on non-Apple-Silicon hosts. */
+  isAppleSilicon?: boolean;
   onInstallPackage?: (strategyId: string) => void;
   installingPackage?: string | null;
   installLogs?: Record<string, StrategyInstallLog>;
@@ -350,6 +352,7 @@ export function CompareView({
   onInstallMtplx,
   installingMtplx,
   mtplxJob,
+  isAppleSilicon = false,
   onInstallPackage,
   installingPackage,
   installLogs,
@@ -806,6 +809,7 @@ export function CompareView({
         onInstallMtplx={onInstallMtplx}
         installingMtplx={installingMtplx}
         mtplxJob={mtplxJob}
+        isAppleSilicon={isAppleSilicon}
         onSelectedKeyChange={setPickerDraftKey}
         onSearchChange={setPickerSearch}
         onSettingChange={(key, value) => {
diff --git a/src/features/chat/HtmlChallengeTab.tsx b/src/features/chat/HtmlChallengeTab.tsx
index 38db449..c417f8e 100644
--- a/src/features/chat/HtmlChallengeTab.tsx
+++ b/src/features/chat/HtmlChallengeTab.tsx
@@ -86,6 +86,8 @@ interface HtmlChallengeTabProps {
   onInstallMtplx?: () => void;
   installingMtplx?: boolean;
   mtplxJob?: MtplxJobState | null;
+  /** FU-056 follow-up: hide MTPLX block on non-Apple-Silicon hosts. */
+  isAppleSilicon?: boolean;
   onInstallPackage?: (strategyId: string) => void;
   installingPackage?: string | null;
   installLogs?: Record<string, StrategyInstallLog>;
@@ -107,6 +109,7 @@ export function HtmlChallengeTab({
   onInstallMtplx,
   installingMtplx,
   mtplxJob,
+  isAppleSilicon = false,
   onInstallPackage,
   installingPackage,
   installLogs,
@@ -1138,6 +1141,7 @@ export function HtmlChallengeTab({
         onInstallMtplx={onInstallMtplx}
         installingMtplx={installingMtplx}
         mtplxJob={mtplxJob}
+        isAppleSilicon={isAppleSilicon}
         onConfirm={(selectedKey, newSettings) => {
           if (pickerTarget) {
             const target = pickerTarget;
diff --git a/src/features/chat/html_challenge/ChallengePickerModal.tsx b/src/features/chat/html_challenge/ChallengePickerModal.tsx
index 35a9579..1de99ac 100644
--- a/src/features/chat/html_challenge/ChallengePickerModal.tsx
+++ b/src/features/chat/html_challenge/ChallengePickerModal.tsx
@@ -32,6 +32,8 @@ interface ChallengePickerModalProps {
   onInstallMtplx?: () => void;
   installingMtplx?: boolean;
   mtplxJob?: MtplxJobState | null;
+  /** FU-056 follow-up: hide MTPLX block on non-Apple-Silicon hosts. */
+  isAppleSilicon?: boolean;
   onConfirm: (selectedKey: string, settings: LaunchPreferences) => void;
   onClose: () => void;
   onInstallPackage: (strategyId: string) => void;
@@ -54,6 +56,7 @@ export function ChallengePickerModal({
   onInstallMtplx,
   installingMtplx,
   mtplxJob,
+  isAppleSilicon = false,
   onConfirm,
   onClose,
   onInstallPackage,
@@ -107,6 +110,7 @@ export function ChallengePickerModal({
       onInstallMtplx={onInstallMtplx}
       installingMtplx={installingMtplx}
       mtplxJob={mtplxJob}
+      isAppleSilicon={isAppleSilicon}
       onSelectedKeyChange={setDraftKey}
       onSearchChange={setSearch}
       onSettingChange={(key, value) => {
diff --git a/src/styles.css b/src/styles.css
index 4abfb1c..d9389bf 100644
--- a/src/styles.css
+++ b/src/styles.css
@@ -2193,6 +2193,38 @@ select.text-input {
   height: 100%;
 }
 
+/* FU-056 follow-up: Chat empty-state welcome banner. Sits inside the
+   ``.empty-state`` dashed container; gives users on a fresh install
+   a clear CTA to Discover instead of the silent "send a message and
+   the largest MLX model auto-downloads" failure mode. */
+.chat-empty-banner {
+  display: flex;
+  flex-direction: column;
+  gap: 10px;
+  padding: 8px 4px;
+  max-width: 520px;
+  margin: 0 auto;
+  text-align: center;
+}
+.chat-empty-banner-title {
+  margin: 0;
+  font-size: 1.05rem;
+  font-weight: 600;
+  color: var(--text);
+}
+.chat-empty-banner-body {
+  margin: 0;
+  font-size: 0.88rem;
+  color: var(--muted-strong);
+  line-height: 1.5;
+}
+.chat-empty-banner-actions {
+  display: flex;
+  justify-content: center;
+  gap: 8px;
+  margin-top: 4px;
+}
+
 .loading-state-progress {
   display: flex;
   flex-direction: column;
diff --git a/src/utils/__tests__/platform.test.ts b/src/utils/__tests__/platform.test.ts
new file mode 100644
index 0000000..1e8f867
--- /dev/null
+++ b/src/utils/__tests__/platform.test.ts
@@ -0,0 +1,78 @@
+import { describe, expect, it } from "vitest";
+
+import { isAppleSiliconHost, isCudaHost, isIntelMac } from "../platform";
+
+describe("isAppleSiliconHost", () => {
+  it("returns true for Darwin + arm64", () => {
+    expect(isAppleSiliconHost({ platform: "Darwin", arch: "arm64" })).toBe(true);
+    expect(isAppleSiliconHost({ platform: "darwin", arch: "arm64" })).toBe(true);
+  });
+
+  it("treats aarch64 as Apple Silicon (some Linux probes report it)", () => {
+    expect(isAppleSiliconHost({ platform: "darwin", arch: "aarch64" })).toBe(true);
+  });
+
+  it("returns false for Intel Mac", () => {
+    expect(isAppleSiliconHost({ platform: "darwin", arch: "x86_64" })).toBe(false);
+  });
+
+  it("returns false for Windows + arm64 (still not Apple Silicon)", () => {
+    expect(isAppleSiliconHost({ platform: "windows", arch: "arm64" })).toBe(false);
+  });
+
+  it("returns false for Linux", () => {
+    expect(isAppleSiliconHost({ platform: "linux", arch: "x86_64" })).toBe(false);
+  });
+
+  it("returns false for null / undefined / partial system", () => {
+    expect(isAppleSiliconHost(null)).toBe(false);
+    expect(isAppleSiliconHost(undefined)).toBe(false);
+    // @ts-expect-error — exercising the early-paint defensive branch
+    expect(isAppleSiliconHost({ platform: "darwin" })).toBe(false);
+  });
+
+  it("is case-insensitive on both fields", () => {
+    expect(isAppleSiliconHost({ platform: "DARWIN", arch: "ARM64" })).toBe(true);
+  });
+});
+
+describe("isCudaHost", () => {
+  it("returns true for Windows + x86_64", () => {
+    expect(isCudaHost({ platform: "Windows", arch: "x86_64" })).toBe(true);
+    expect(isCudaHost({ platform: "windows", arch: "AMD64" })).toBe(true);
+  });
+
+  it("returns true for Linux + x86_64", () => {
+    expect(isCudaHost({ platform: "linux", arch: "x86_64" })).toBe(true);
+  });
+
+  it("returns false for Darwin (no CUDA on macOS)", () => {
+    expect(isCudaHost({ platform: "darwin", arch: "x86_64" })).toBe(false);
+    expect(isCudaHost({ platform: "darwin", arch: "arm64" })).toBe(false);
+  });
+
+  it("returns false for ARM Linux (not the CUDA-class hosts we ship for)", () => {
+    expect(isCudaHost({ platform: "linux", arch: "arm64" })).toBe(false);
+  });
+
+  it("returns false for null / undefined system", () => {
+    expect(isCudaHost(null)).toBe(false);
+    expect(isCudaHost(undefined)).toBe(false);
+  });
+});
+
+describe("isIntelMac", () => {
+  it("returns true for Darwin + x86_64", () => {
+    expect(isIntelMac({ platform: "darwin", arch: "x86_64" })).toBe(true);
+    expect(isIntelMac({ platform: "darwin", arch: "amd64" })).toBe(true);
+  });
+
+  it("returns false for Apple Silicon", () => {
+    expect(isIntelMac({ platform: "darwin", arch: "arm64" })).toBe(false);
+  });
+
+  it("returns false for Windows / Linux", () => {
+    expect(isIntelMac({ platform: "windows", arch: "x86_64" })).toBe(false);
+    expect(isIntelMac({ platform: "linux", arch: "x86_64" })).toBe(false);
+  });
+});
diff --git a/src/utils/index.ts b/src/utils/index.ts
index c8ac5ba..f98f2dc 100644
--- a/src/utils/index.ts
+++ b/src/utils/index.ts
@@ -10,3 +10,4 @@ export * from "./cache";
 export * from "./keyboard";
 export * from "./discoverSort";
 export * from "./capabilities";
+export * from "./platform";
diff --git a/src/utils/platform.ts b/src/utils/platform.ts
new file mode 100644
index 0000000..0a12587
--- /dev/null
+++ b/src/utils/platform.ts
@@ -0,0 +1,70 @@
+import type { SystemStats } from "../types";
+
+/**
+ * Host-platform classifiers (FU-056 follow-up).
+ *
+ * Centralised so UI gates can ask "is this host capable of running
+ * X?" without sprinkling ``osSystem === "darwin"`` checks across every
+ * Studio + Settings + RuntimeControls surface. Reads from
+ * ``workspace.system`` which the backend already populates from
+ * ``platform.system()`` + ``platform.machine()``.
+ *
+ * The three checks here are the ones that actually gate UI today:
+ *
+ *   - ``isAppleSiliconHost`` — Darwin + arm64. MLX, MLX-LM, MLX-VLM,
+ *     mlx-video, mflux, MTPLX, dflash-mlx, turboquant-mlx-full all
+ *     need Apple Silicon hardware (the MLX framework is closed to
+ *     Metal-backed unified-memory devices). UI install prompts for
+ *     any of these are noise + an install attempt would silently no-op
+ *     on Windows / Linux / Intel Mac.
+ *
+ *   - ``isCudaHost`` — Windows or Linux (x86_64). vLLM, nunchaku,
+ *     sageattention, dflash (CUDA package), triattention, kvpress,
+ *     LongLive all need a CUDA-class GPU. macOS hosts can't reach
+ *     these regardless of GPU brand (no CUDA drivers on macOS).
+ *
+ *   - ``isIntelMac`` — Darwin + x86_64. Rare today but worth gating
+ *     separately because the user gets neither Apple-Silicon-only
+ *     MLX nor (typically) CUDA, so the UI should be honest about
+ *     the empty option set.
+ *
+ * All checks accept a missing / partial ``system`` so the early-paint
+ * skeleton state never crashes a surface. ``undefined`` reads as
+ * "don't show platform-specific affordances yet" — preferable to a
+ * flash of inappropriate UI before the probe lands.
+ */
+
+type SystemLike = Pick<SystemStats, "platform" | "arch"> | null | undefined;
+
+function normalize(value: string | undefined): string {
+  return (value ?? "").trim().toLowerCase();
+}
+
+/** True when ``system`` describes a Darwin host on Apple Silicon
+ * (``arm64`` / ``aarch64``). Negative otherwise — including when
+ * ``system`` is undefined. */
+export function isAppleSiliconHost(system: SystemLike): boolean {
+  if (!system) return false;
+  const platform = normalize(system.platform);
+  const arch = normalize(system.arch);
+  return platform === "darwin" && (arch === "arm64" || arch === "aarch64");
+}
+
+/** True when ``system`` describes a Windows or Linux x86_64 host.
+ * Used to gate CUDA-only install affordances (vLLM, nunchaku, etc.). */
+export function isCudaHost(system: SystemLike): boolean {
+  if (!system) return false;
+  const platform = normalize(system.platform);
+  if (platform !== "windows" && platform !== "linux") return false;
+  const arch = normalize(system.arch);
+  return arch === "x86_64" || arch === "amd64";
+}
+
+/** True when ``system`` describes an Intel Mac (rare in 2026 but
+ * still shipping — neither MLX nor CUDA applies). */
+export function isIntelMac(system: SystemLike): boolean {
+  if (!system) return false;
+  const platform = normalize(system.platform);
+  const arch = normalize(system.arch);
+  return platform === "darwin" && (arch === "x86_64" || arch === "amd64");
+}

From e17183a70cd7347e0b0731c71b4a902222e6233e Mon Sep 17 00:00:00 2001
From: Cryptopoly <31970407+cryptopoly@users.noreply.github.com>
Date: Sun, 17 May 2026 20:31:20 +0100
Subject: [PATCH 15/15] feat: hide platform-incompatible catalog variants
 entirely (FU-056)

Per FU-034 "hide unrecoverable options" policy, extend it to whole
catalog rows. Windows / Linux users no longer see MLX / mlx-video /
mflux / MTPLX entries they can never run, and Apple Silicon users no
longer see vLLM / nunchaku / CUDA-only entries.

- src/utils/platform.ts: imageOrVideoVariantPlatformGate +
  chatVariantPlatformGate + isVariantCompatibleWithHost derive a
  PlatformGate ("apple-silicon" | "cuda" | "any") from existing variant
  fields (runtime / backend / styleTags / repo prefix). No catalog
  schema change required.
- ImageModelsTab / ImageDiscoverTab / VideoModelsTab / VideoDiscoverTab:
  new hostSystem prop, filtered through isVariantCompatibleWithHost in
  the rows/filteredResults useMemo.
- App.tsx: threaded workspace.system into all four tabs;
  libraryChatOptions now also filtered so the launch dropdown drops
  MLX backends on Win/Linux.
- AcceleratorsBoostPack: showIncompatible flipped off, the table now
  surfaces only accelerators the current host can install.

16 new vitest cases pin the helper boundaries (Apple Silicon host
hides CUDA-only variants, Linux x86_64 hides Apple-Silicon-only
variants, "any" gate passes on every host, etc). All 440 frontend
tests pass; tsc clean.
---
 src/App.tsx                                   |  20 +++-
 src/features/images/ImageDiscoverTab.tsx      |  18 ++-
 src/features/images/ImageModelsTab.tsx        |  22 +++-
 .../settings/AcceleratorsBoostPack.tsx        |  14 +--
 src/features/video/VideoDiscoverTab.tsx       |  14 +++
 src/features/video/VideoModelsTab.tsx         |  18 ++-
 src/utils/__tests__/platform.test.ts          | 103 +++++++++++++++-
 src/utils/platform.ts                         | 110 ++++++++++++++++++
 8 files changed, 306 insertions(+), 13 deletions(-)

diff --git a/src/App.tsx b/src/App.tsx
index cb0e05f..b972ea7 100644
--- a/src/App.tsx
+++ b/src/App.tsx
@@ -90,6 +90,8 @@ import {
   serverOriginFromBase,
   isUnsavedEmptySession,
   isAppleSiliconHost,
+  isVariantCompatibleWithHost,
+  chatVariantPlatformGate,
 } from "./utils";
 import {
   useWorkspace,
@@ -370,6 +372,12 @@ export default function App() {
   // Only list models present in the local library — catalog-only entries
   // would let the user pick a model that isn't downloaded yet, which then
   // 500s on Load. Discover tab is the place to pull a new model.
+  //
+  // FU-056 follow-up: also filter by host platform. MLX-backed chat
+  // options (``backend === "mlx"``) only run on Apple Silicon; vLLM
+  // options (``backend === "vllm"``) only on CUDA hosts. Without this
+  // filter, Windows users see MLX rows in every model picker that
+  // would error on load.
   const libraryChatOptions: ChatModelOption[] = chatLibrary
     .filter((item) => !item.broken)
     .map((item) => {
@@ -398,7 +406,13 @@ export default function App() {
         // capability badges per option without re-deriving in each view.
         capabilities: resolveCapabilities(canonicalRepo ?? item.name, matched?.capabilities ?? null),
       };
-    });
+    })
+    .filter((option) =>
+      isVariantCompatibleWithHost(
+        chatVariantPlatformGate(option),
+        workspace.system,
+      ),
+    );
 
   const threadModelOptions = libraryChatOptions;
 
@@ -1236,6 +1250,7 @@ export default function App() {
         selectedImageVariant={imgState.selectedImageVariant}
         fileRevealLabel={fileRevealLabel}
         nativeBackends={nativeBackends}
+        hostSystem={workspace.system}
         onActiveTabChange={setActiveTab}
         onOpenImageStudio={imgState.openImageStudio}
         onImageDownload={(repo) => void imgState.handleImageDownload(repo)}
@@ -1253,6 +1268,7 @@ export default function App() {
         activeImageDownloads={imgState.activeImageDownloads}
         fileRevealLabel={fileRevealLabel}
         nativeBackends={nativeBackends}
+        hostSystem={workspace.system}
         onActiveTabChange={setActiveTab}
         onOpenImageStudio={imgState.openImageStudio}
         onImageDownload={(repo) => void imgState.handleImageDownload(repo)}
@@ -1392,6 +1408,7 @@ export default function App() {
         selectedVideoVariant={videoState.selectedVideoVariant}
         fileRevealLabel={fileRevealLabel}
         nativeBackends={nativeBackends}
+        hostSystem={workspace.system}
         longLiveStatus={videoState.longLiveStatus}
         installingLongLive={videoState.installingLongLive}
         longLiveJob={videoState.longLiveJob}
@@ -1418,6 +1435,7 @@ export default function App() {
         loadedVideoVariant={videoState.loadedVideoVariant}
         fileRevealLabel={fileRevealLabel}
         nativeBackends={nativeBackends}
+        hostSystem={workspace.system}
         onActiveTabChange={setActiveTab}
         onOpenVideoStudio={videoState.openVideoStudio}
         onVideoDownload={(repo, modelId) => void videoState.handleVideoDownload(repo, modelId)}
diff --git a/src/features/images/ImageDiscoverTab.tsx b/src/features/images/ImageDiscoverTab.tsx
index b562848..44b12cc 100644
--- a/src/features/images/ImageDiscoverTab.tsx
+++ b/src/features/images/ImageDiscoverTab.tsx
@@ -6,6 +6,7 @@ import { IconActionButton, StatusIcon } from "../../components/ModelActionIcons"
 import type { DownloadStatus } from "../../api";
 import type {
   ImageModelVariant,
+  SystemStats,
   TabId,
 } from "../../types";
 import type {
@@ -26,6 +27,8 @@ import {
   imagePrimarySizeLabel,
   imageSecondarySizeLabel,
   isGatedImageAccessError,
+  imageOrVideoVariantPlatformGate,
+  isVariantCompatibleWithHost,
 } from "../../utils";
 import { AcceleratorCard } from "../../components/AcceleratorCard";
 import {
@@ -55,6 +58,9 @@ export interface ImageDiscoverTabProps {
    * rendered next to each variant. Optional — pre-ready or older
    * backends collapse pills to their "available" form. */
   nativeBackends?: NativeBackendStatus;
+  /** FU-056 follow-up: host platform info for hiding MLX-only /
+   * CUDA-only variants on the wrong host. */
+  hostSystem?: Pick<SystemStats, "platform" | "arch">;
   onActiveTabChange: (tab: TabId) => void;
   onOpenImageStudio: (modelId?: string) => void;
   onImageDownload: (repo: string) => void;
@@ -222,6 +228,7 @@ export function ImageDiscoverTab({
   selectedImageVariant,
   fileRevealLabel,
   nativeBackends,
+  hostSystem,
   onActiveTabChange,
   onOpenImageStudio,
   onImageDownload,
@@ -243,6 +250,15 @@ export function ImageDiscoverTab({
           const memoryEstimate = imageDiscoverMemoryEstimate(variant);
           return { variant, status, memoryEstimate };
         })
+        .filter(({ variant }) =>
+          // FU-056 follow-up: hide mflux-runtime + LTX-2-style apple-
+          // only variants on Win/Linux, nunchaku-only rows on Mac.
+          // "any"-gated rows pass through (the bulk of the catalog).
+          isVariantCompatibleWithHost(
+            imageOrVideoVariantPlatformGate(variant),
+            hostSystem,
+          ),
+        )
         .filter(({ status }) => statusFilter === "all" || status === statusFilter)
         .sort((left, right) => {
           if (imageDiscoverSort === "name") {
@@ -277,7 +293,7 @@ export function ImageDiscoverTab({
           if (dateDiff !== 0) return sortDir === "desc" ? dateDiff : -dateDiff;
           return left.variant.name.localeCompare(right.variant.name);
         }),
-    [activeImageDownloads, combinedImageDiscoverResults, imageDiscoverSort, sortDir, statusFilter],
+    [activeImageDownloads, combinedImageDiscoverResults, imageDiscoverSort, sortDir, statusFilter, hostSystem],
   );
   const hasActiveFilters = imageDiscoverHasActiveFilters || statusFilter !== "all";
 
diff --git a/src/features/images/ImageModelsTab.tsx b/src/features/images/ImageModelsTab.tsx
index df1ecf2..6c4f46f 100644
--- a/src/features/images/ImageModelsTab.tsx
+++ b/src/features/images/ImageModelsTab.tsx
@@ -6,6 +6,7 @@ import type { DownloadStatus } from "../../api";
 import type {
   ImageModelFamily,
   ImageModelVariant,
+  SystemStats,
   TabId,
 } from "../../types";
 import type { NativeBackendStatus } from "../../types/server";
@@ -17,6 +18,8 @@ import {
   imageDiscoverMemoryEstimate,
   imagePrimarySizeLabel,
   imageSecondarySizeLabel,
+  imageOrVideoVariantPlatformGate,
+  isVariantCompatibleWithHost,
 } from "../../utils";
 import { AcceleratorCard } from "../../components/AcceleratorCard";
 import {
@@ -38,6 +41,10 @@ export interface ImageModelsTabProps {
    * backends or pre-ready state) collapses every pill to its
    * "available" form rather than crashing. */
   nativeBackends?: NativeBackendStatus;
+  /** FU-056 follow-up: host platform info for hiding MLX-only /
+   * CUDA-only variants on the wrong host. Optional — undefined
+   * passes everything through (early-paint safety). */
+  hostSystem?: Pick<SystemStats, "platform" | "arch">;
   onActiveTabChange: (tab: TabId) => void;
   onOpenImageStudio: (modelId?: string) => void;
   onImageDownload: (repo: string) => void;
@@ -142,6 +149,7 @@ export function ImageModelsTab({
   activeImageDownloads,
   fileRevealLabel,
   nativeBackends,
+  hostSystem,
   onActiveTabChange,
   onOpenImageStudio,
   onImageDownload,
@@ -180,6 +188,18 @@ export function ImageModelsTab({
         return { variant, family, downloadState, status, memoryEstimate };
       })
       .filter(({ variant, family, status }) => {
+        // FU-056 follow-up: hide variants whose runtime can't run on
+        // this host (mflux on Windows, nunchaku-only on Mac, etc.).
+        // Variants tagged ``"any"`` always pass — that's the bulk of
+        // the catalog (diffusers / sd.cpp / GGUF universal paths).
+        if (
+          !isVariantCompatibleWithHost(
+            imageOrVideoVariantPlatformGate(variant),
+            hostSystem,
+          )
+        ) {
+          return false;
+        }
         if (taskFilter !== "all" && !variant.taskSupport.includes(taskFilter)) return false;
         if (statusFilter !== "all" && status !== statusFilter) return false;
         if (!normalizedSearch) return true;
@@ -222,7 +242,7 @@ export function ImageModelsTab({
         if (dateDiff !== 0) return sortDir === "desc" ? dateDiff : -dateDiff;
         return left.variant.name.localeCompare(right.variant.name);
       });
-  }, [activeImageDownloads, imageCatalog, installedImageVariants, normalizedSearch, sort, sortDir, statusFilter, taskFilter]);
+  }, [activeImageDownloads, imageCatalog, installedImageVariants, normalizedSearch, sort, sortDir, statusFilter, taskFilter, hostSystem]);
 
   return (
     <div className="content-grid image-page-grid">
diff --git a/src/features/settings/AcceleratorsBoostPack.tsx b/src/features/settings/AcceleratorsBoostPack.tsx
index daaaea5..f162a7d 100644
--- a/src/features/settings/AcceleratorsBoostPack.tsx
+++ b/src/features/settings/AcceleratorsBoostPack.tsx
@@ -24,13 +24,12 @@ import type { NativeBackendStatus } from "../../types/server";
  * keyed by ``pipPackage`` — the card itself stays stateless beyond
  * its "log expanded" toggle.
  *
- * The panel intentionally renders **every** entry in
- * ``ACCELERATOR_CATALOG`` regardless of platform (``showIncompatible``
- * is true). The user-experience choice here: this is the diagnostics
- * surface, the user wants visibility into what exists across the
- * ecosystem, not just what their current box can run. Per-feature
- * surfaces will gate by platform so wrong-platform affordances don't
- * appear next to a FLUX model card.
+ * The panel hides entries that can't run on the current host
+ * (``showIncompatible`` is false). Windows / Linux users no longer
+ * see MLX / mlx-video / MTPLX cards they can never install, and
+ * Apple Silicon users no longer see CUDA-only accelerators. Per-FU-056
+ * platform-filtering policy: don't grey out unrecoverable options,
+ * just hide them.
  */
 
 export interface AcceleratorsBoostPackProps {
@@ -186,7 +185,6 @@ export function AcceleratorsBoostPack({ backendOnline }: AcceleratorsBoostPackPr
                 installError={state.error}
                 installOutput={state.output}
                 onInstall={handleInstall}
-                showIncompatible
               />
             );
           })}
diff --git a/src/features/video/VideoDiscoverTab.tsx b/src/features/video/VideoDiscoverTab.tsx
index a432143..8c67f58 100644
--- a/src/features/video/VideoDiscoverTab.tsx
+++ b/src/features/video/VideoDiscoverTab.tsx
@@ -6,6 +6,7 @@ import { IconActionButton, StatusIcon } from "../../components/ModelActionIcons"
 import { Panel } from "../../components/Panel";
 import type { DownloadStatus, InstallResult, LongLiveJobState } from "../../api";
 import type {
+  SystemStats,
   TabId,
   VideoModelVariant,
   VideoRuntimeStatus,
@@ -25,6 +26,8 @@ import {
   videoDownloadStatusForVariant,
   videoPrimarySizeLabel,
   videoSecondarySizeLabel,
+  imageOrVideoVariantPlatformGate,
+  isVariantCompatibleWithHost,
 } from "../../utils";
 import { AcceleratorCard } from "../../components/AcceleratorCard";
 import {
@@ -61,6 +64,9 @@ export interface VideoDiscoverTabProps {
    * rendered next to each variant. Optional — older backends collapse
    * pills to the "available" form. */
   nativeBackends?: NativeBackendStatus;
+  /** FU-056 follow-up: host platform info for hiding mlx-video /
+   * LTX-2 (apple-only) variants on Win/Linux. */
+  hostSystem?: Pick<SystemStats, "platform" | "arch">;
   longLiveStatus: VideoRuntimeStatus | null;
   installingLongLive: boolean;
   longLiveJob: LongLiveJobState | null;
@@ -246,6 +252,7 @@ export function VideoDiscoverTab({
   selectedVideoVariant,
   fileRevealLabel,
   nativeBackends,
+  hostSystem,
   longLiveStatus,
   installingLongLive,
   longLiveJob,
@@ -280,6 +287,12 @@ export function VideoDiscoverTab({
           const memoryEstimate = videoDiscoverMemoryEstimate(variant);
           return { variant, status, memoryEstimate };
         })
+        .filter(({ variant }) =>
+          isVariantCompatibleWithHost(
+            imageOrVideoVariantPlatformGate(variant),
+            hostSystem,
+          ),
+        )
         .filter(({ status }) => statusFilter === "all" || status === statusFilter)
         .sort((left, right) => {
           if (videoDiscoverSort === "name") {
@@ -317,6 +330,7 @@ export function VideoDiscoverTab({
     [
       activeVideoDownloads,
       combinedVideoDiscoverResults,
+      hostSystem,
       installingLongLive,
       longLiveReady,
       sortDir,
diff --git a/src/features/video/VideoModelsTab.tsx b/src/features/video/VideoModelsTab.tsx
index a94c39e..2e97109 100644
--- a/src/features/video/VideoModelsTab.tsx
+++ b/src/features/video/VideoModelsTab.tsx
@@ -4,6 +4,7 @@ import { Panel } from "../../components/Panel";
 import { IconActionButton, StatusIcon } from "../../components/ModelActionIcons";
 import type { DownloadStatus } from "../../api";
 import type {
+  SystemStats,
   TabId,
   VideoModelFamily,
   VideoModelVariant,
@@ -21,6 +22,8 @@ import {
   videoDownloadStatusForVariant,
   videoPrimarySizeLabel,
   videoSecondarySizeLabel,
+  imageOrVideoVariantPlatformGate,
+  isVariantCompatibleWithHost,
 } from "../../utils";
 import { AcceleratorCard } from "../../components/AcceleratorCard";
 import {
@@ -46,6 +49,9 @@ export interface VideoModelsTabProps {
    * Wan / HunyuanVideo / LTX / CogVideoX / Mochi). Optional — older
    * backends collapse pills to the "available" state. */
   nativeBackends?: NativeBackendStatus;
+  /** FU-056 follow-up: host platform info for hiding MLX-only video
+   * variants (mlx-video / LTX-2 family) on Windows + Linux. */
+  hostSystem?: Pick<SystemStats, "platform" | "arch">;
   onActiveTabChange: (tab: TabId) => void;
   onOpenVideoStudio: (modelId?: string) => void;
   onVideoDownload: (repo: string, modelId?: string) => void;
@@ -166,6 +172,7 @@ export function VideoModelsTab({
   loadedVideoVariant,
   fileRevealLabel,
   nativeBackends,
+  hostSystem,
   onActiveTabChange,
   onOpenVideoStudio,
   onVideoDownload,
@@ -208,6 +215,15 @@ export function VideoModelsTab({
         return { variant, family, downloadState, status, memoryEstimate };
       })
       .filter(({ variant, family, status }) => {
+        // FU-056 follow-up: hide mlx-video / LTX-2 family on Win/Linux.
+        if (
+          !isVariantCompatibleWithHost(
+            imageOrVideoVariantPlatformGate(variant),
+            hostSystem,
+          )
+        ) {
+          return false;
+        }
         if (taskFilter !== "all" && !variant.taskSupport.includes(taskFilter)) return false;
         if (statusFilter !== "all" && status !== statusFilter) return false;
         if (!normalizedSearch) return true;
@@ -250,7 +266,7 @@ export function VideoModelsTab({
         if (dateDiff !== 0) return sortDir === "desc" ? dateDiff : -dateDiff;
         return left.variant.name.localeCompare(right.variant.name);
       });
-  }, [activeVideoDownloads, installedVideoVariants, loadedVideoVariant, normalizedSearch, sort, sortDir, statusFilter, taskFilter, videoCatalog]);
+  }, [activeVideoDownloads, installedVideoVariants, loadedVideoVariant, normalizedSearch, sort, sortDir, statusFilter, taskFilter, videoCatalog, hostSystem]);
 
   return (
     <div className="content-grid image-page-grid">
diff --git a/src/utils/__tests__/platform.test.ts b/src/utils/__tests__/platform.test.ts
index 1e8f867..b33f6bf 100644
--- a/src/utils/__tests__/platform.test.ts
+++ b/src/utils/__tests__/platform.test.ts
@@ -1,6 +1,13 @@
 import { describe, expect, it } from "vitest";
 
-import { isAppleSiliconHost, isCudaHost, isIntelMac } from "../platform";
+import {
+  chatVariantPlatformGate,
+  imageOrVideoVariantPlatformGate,
+  isAppleSiliconHost,
+  isCudaHost,
+  isIntelMac,
+  isVariantCompatibleWithHost,
+} from "../platform";
 
 describe("isAppleSiliconHost", () => {
   it("returns true for Darwin + arm64", () => {
@@ -76,3 +83,97 @@ describe("isIntelMac", () => {
     expect(isIntelMac({ platform: "linux", arch: "x86_64" })).toBe(false);
   });
 });
+
+describe("imageOrVideoVariantPlatformGate", () => {
+  it("mflux runtime → apple-silicon", () => {
+    expect(imageOrVideoVariantPlatformGate({ runtime: "mflux (MLX native)" })).toBe("apple-silicon");
+  });
+
+  it("mlx-video runtime → apple-silicon", () => {
+    expect(imageOrVideoVariantPlatformGate({ runtime: "mlx-video (MLX native)" })).toBe("apple-silicon");
+  });
+
+  it("prince-canuma repos → apple-silicon (LTX-2 family)", () => {
+    expect(imageOrVideoVariantPlatformGate({ repo: "prince-canuma/LTX-2-distilled", runtime: "" }))
+      .toBe("apple-silicon");
+  });
+
+  it("apple-silicon styleTag → apple-silicon", () => {
+    expect(imageOrVideoVariantPlatformGate({ runtime: "", styleTags: ["fast", "apple-silicon"] }))
+      .toBe("apple-silicon");
+  });
+
+  it("nunchaku runtime → cuda", () => {
+    expect(imageOrVideoVariantPlatformGate({ runtime: "diffusers + nunchaku SVDQuant (CUDA)" }))
+      .toBe("cuda");
+  });
+
+  it("cuda styleTag → cuda", () => {
+    expect(imageOrVideoVariantPlatformGate({ runtime: "", styleTags: ["cuda", "int4"] })).toBe("cuda");
+  });
+
+  it("diffusers / sd.cpp / GGUF rows → any", () => {
+    expect(imageOrVideoVariantPlatformGate({ runtime: "diffusers LTXPipeline" })).toBe("any");
+    expect(imageOrVideoVariantPlatformGate({ runtime: "stable-diffusion.cpp (subprocess)" })).toBe("any");
+    expect(imageOrVideoVariantPlatformGate({ runtime: "Stub diffusion pipeline", styleTags: ["gguf"] }))
+      .toBe("any");
+  });
+
+  it("empty / missing variant → any (safe default)", () => {
+    expect(imageOrVideoVariantPlatformGate({})).toBe("any");
+  });
+});
+
+describe("chatVariantPlatformGate", () => {
+  it("mlx backend → apple-silicon", () => {
+    expect(chatVariantPlatformGate({ backend: "mlx" })).toBe("apple-silicon");
+    expect(chatVariantPlatformGate({ backend: "MLX" })).toBe("apple-silicon");
+  });
+
+  it("vllm backend → cuda", () => {
+    expect(chatVariantPlatformGate({ backend: "vllm" })).toBe("cuda");
+  });
+
+  it("llama.cpp / gguf → any", () => {
+    expect(chatVariantPlatformGate({ backend: "llama.cpp" })).toBe("any");
+    expect(chatVariantPlatformGate({ backend: "gguf" })).toBe("any");
+    expect(chatVariantPlatformGate({ backend: "auto" })).toBe("any");
+  });
+
+  it("missing backend → any", () => {
+    expect(chatVariantPlatformGate({})).toBe("any");
+  });
+});
+
+describe("isVariantCompatibleWithHost", () => {
+  const win = { platform: "windows", arch: "x86_64" };
+  const linux = { platform: "linux", arch: "x86_64" };
+  const apple = { platform: "darwin", arch: "arm64" };
+  const intelMac = { platform: "darwin", arch: "x86_64" };
+
+  it("'any' gate passes every host", () => {
+    expect(isVariantCompatibleWithHost("any", win)).toBe(true);
+    expect(isVariantCompatibleWithHost("any", linux)).toBe(true);
+    expect(isVariantCompatibleWithHost("any", apple)).toBe(true);
+    expect(isVariantCompatibleWithHost("any", intelMac)).toBe(true);
+  });
+
+  it("'apple-silicon' gate only passes Apple Silicon", () => {
+    expect(isVariantCompatibleWithHost("apple-silicon", apple)).toBe(true);
+    expect(isVariantCompatibleWithHost("apple-silicon", win)).toBe(false);
+    expect(isVariantCompatibleWithHost("apple-silicon", linux)).toBe(false);
+    expect(isVariantCompatibleWithHost("apple-silicon", intelMac)).toBe(false);
+  });
+
+  it("'cuda' gate passes Win+Linux x86_64, not Mac", () => {
+    expect(isVariantCompatibleWithHost("cuda", win)).toBe(true);
+    expect(isVariantCompatibleWithHost("cuda", linux)).toBe(true);
+    expect(isVariantCompatibleWithHost("cuda", apple)).toBe(false);
+    expect(isVariantCompatibleWithHost("cuda", intelMac)).toBe(false);
+  });
+
+  it("null / undefined system → true (early-paint safety)", () => {
+    expect(isVariantCompatibleWithHost("apple-silicon", null)).toBe(true);
+    expect(isVariantCompatibleWithHost("cuda", undefined)).toBe(true);
+  });
+});
diff --git a/src/utils/platform.ts b/src/utils/platform.ts
index 0a12587..4810527 100644
--- a/src/utils/platform.ts
+++ b/src/utils/platform.ts
@@ -68,3 +68,113 @@ export function isIntelMac(system: SystemLike): boolean {
   const arch = normalize(system.arch);
   return platform === "darwin" && (arch === "x86_64" || arch === "amd64");
 }
+
+
+// ---------------------------------------------------------------------------
+// Catalog-variant platform gates (FU-056 follow-up)
+//
+// The Image / Video / Chat catalogs don't carry an explicit platform
+// field — the routing info lives in ``runtime`` (image/video) and
+// ``backend`` (chat). These helpers normalise that into a single
+// ``"apple-silicon" | "cuda" | "any"`` discriminator so the tab
+// filters + the AcceleratorsBoostPack can use one rule:
+//
+//     keep variant iff (gate === "any") || gate-matches-host
+//
+// "any" includes anything cross-platform — diffusers / llama-server /
+// sd.cpp / GGUF — which is the vast majority of catalog rows.
+// ---------------------------------------------------------------------------
+
+export type PlatformGate = "apple-silicon" | "cuda" | "any";
+
+interface VariantLikeImageOrVideo {
+  runtime?: string | null;
+  styleTags?: string[];
+  repo?: string;
+}
+
+/** Classify an image / video variant by its runtime engine.
+ *
+ * Discriminators (in priority order):
+ *   1. ``runtime`` includes ``mflux`` or ``mlx-video`` → Apple Silicon
+ *      only (those engines literally don't exist on Win/Linux).
+ *   2. ``runtime`` includes ``nunchaku`` → CUDA only (the SVDQuant
+ *      wheels are CUDA-only; the diffusers fallback path is a
+ *      separate variant in the catalog).
+ *   3. ``styleTags`` carries ``apple-silicon`` / ``cuda`` — catalog
+ *      curators flag this explicitly on rows where the discriminator
+ *      isn't obvious from runtime alone.
+ *   4. ``repo`` prefix ``prince-canuma/`` → the LTX-2 family, all
+ *      MLX-native (no diffusers mirror today).
+ *   5. Default ``"any"`` — diffusers / sd.cpp / GGUF / Wan-AI base
+ *      rows run on every platform via the universal backends.
+ */
+export function imageOrVideoVariantPlatformGate(variant: VariantLikeImageOrVideo): PlatformGate {
+  const runtime = normalize(variant.runtime ?? "");
+  const tags = (variant.styleTags ?? []).map((t) => t.toLowerCase());
+  const repo = (variant.repo ?? "").toLowerCase();
+
+  if (runtime.includes("mflux") || runtime.includes("mlx-video") || runtime.includes("mlx native")) {
+    return "apple-silicon";
+  }
+  if (repo.startsWith("prince-canuma/")) {
+    return "apple-silicon";
+  }
+  if (tags.includes("apple-silicon")) {
+    return "apple-silicon";
+  }
+  if (runtime.includes("nunchaku") || tags.includes("cuda")) {
+    return "cuda";
+  }
+  return "any";
+}
+
+interface VariantLikeChat {
+  backend?: string | null;
+}
+
+/** Classify a chat option by its inference backend.
+ *
+ *   - ``mlx`` → Apple Silicon only (the MLX framework has no
+ *     Win/Linux build; both direct-launch ``mlx-community/*`` and
+ *     convert-then-launch transformers variants share this gate).
+ *   - ``vllm`` → CUDA host (no Windows wheels — the user installs
+ *     into WSL on Windows, native on Linux).
+ *   - everything else (``llama.cpp`` / ``gguf`` / ``transformers``
+ *     for image-runtime-style HF loads / ``auto``) → ``"any"``.
+ */
+export function chatVariantPlatformGate(variant: VariantLikeChat): PlatformGate {
+  const backend = normalize(variant.backend ?? "");
+  if (backend === "mlx" || backend === "mlx-lm" || backend === "mtplx") {
+    return "apple-silicon";
+  }
+  if (backend === "vllm") {
+    return "cuda";
+  }
+  return "any";
+}
+
+/** Cross-cutting "should the UI show this gate's affordances?" check.
+ *
+ * Returns true when the variant runs cleanly on ``system``. The
+ * Apple-Silicon gate accepts any Darwin arm64 host; the CUDA gate
+ * accepts Windows + Linux x86_64; ``"any"`` always passes. Used by
+ * Discover / Models tabs + the AcceleratorsBoostPack to filter
+ * incompatible rows out entirely (per the FU-034 "hide unrecoverable
+ * options" rule).
+ *
+ * Conservative on partial system info: when ``system`` is null /
+ * undefined (early paint before probe lands) we return ``true`` so
+ * the UI doesn't strip variants prematurely. The flash of slightly-
+ * wrong-content is better than a flash of empty Discover.
+ */
+export function isVariantCompatibleWithHost(
+  gate: PlatformGate,
+  system: SystemLike,
+): boolean {
+  if (gate === "any") return true;
+  if (!system) return true; // early-paint safety
+  if (gate === "apple-silicon") return isAppleSiliconHost(system);
+  if (gate === "cuda") return isCudaHost(system);
+  return true;
+}