Feature/accelerator install ux by cryptopoly · Pull Request #58 · cryptopoly/ChaosEngineAI

cryptopoly · 2026-05-17T22:03:37Z

No description provided.

Foundation for in-app install UX. Lazy importability + version probes for nunchaku / sageattention / dflash-mlx / dflash-cuda / triattention / kvpress, plus a Windows-only wsl2 detector that seeds the upcoming vLLM-via-WSL bridge. Eleven new fields on BackendCapabilities surface through /api/health; the placeholder probe primes them on first paint so the UI never flashes Install for a package that is actually present. Probes resilient to the half-baked-install failure mode we hit on Windows (torch directory present but Python source missing): find_spec swallows ValueError, version reads swallow ImportError and missing __version__. DFlash MLX vs CUDA flags delegate to the existing dflash.is_mlx_available / dflash.is_vllm_available helpers so the upstream package-layout dance stays in one place. Tests: 25 in tests/test_accelerator_capabilities.py covering present / absent / broken-install / WSL-status branches.

Tests should exercise the same install users have, not a parallel .venv install. New tests/conftest.py calls ensure_extras_on_sys_path at collection time, so pytest tests/ resolves torch / diffusers / mlx / nunchaku / sageattention / triattention / vllm against the persistent extras dir at: Windows: %LOCALAPPDATA%\ChaosEngineAI\extras\cp{XY}\site-packages macOS: ~/Library/Application Support/ChaosEngineAI/extras/cp{XY}/site-packages Linux: ${XDG_DATA_HOME}/ChaosEngineAI/extras/cp{XY}/site-packages A torch upgrade landing via the in-app installer is reflected in the next pytest run automatically; no pip install dance in .venv. On a fresh CI box without the extras dir the conftest is a silent no-op, so existing test boxes keep working. Set CHAOSENGINE_TEST_TRACE_EXTRAS=1 to log which extras path got loaded for a given run. Runners (e2e_test_suite.py, cache-strategy-matrix.py) now print an actionable hint when the backend is not reachable: open the ChaosEngineAI app, rather than just backend not reachable; aborting. Both still exit 2/3 respectively so CI gates stay reliable. Docs (testing/overview.md, testing/e2e-testing.md) updated with the canonical open-the-app-then-run-tests flow, with the headless dev backend kept as an advanced option for contributors.

Reusable card for the six CUDA-side accelerators (nunchaku, sageattention, dflash-mlx, dflash-cuda, triattention, kvpress). Three placement variants share one component so the per-feature surfaces in Phases 3-6 stay in sync without re-implementing the three states (idle / installing / installed / failed) per surface: - card: full banner with title, claim, applies-to, size pill, primary action. Lands in the Image / Video Studio runtime banners and the Diagnostics Boost Pack. - pill: compact horizontal chip with 4-bit-style copy. Lands on catalog variant cards in the Discover / Models tabs. - row: table form for Diagnostics Boost Pack's scannable view. State ownership: parent owns the install lifecycle (which package is in flight, success/failure, captured pip output). The card only owns the log-expanded toggle. Mirrors the CudaTorchLogPanel contract so the card is cheap to render in many places without duplicating polling work. New catalog (src/components/acceleratorCatalog.ts) is the single source of truth for each accelerator's pip name, capability flag, speedup claim, size, install mode, and platform gate. Adding a seventh accelerator is one entry here, one Phase 1 capability flag, and one row in the backend's _INSTALLABLE_PIP_PACKAGES. NativeBackendStatus (src/types/server.ts) extended with the 13 FU-056 Phase 1 fields plus the older vllm/mtplx/ggufMtp fields that were already on the wire but missing from the TS interface. All fields optional so a backend running an older build than the frontend doesn't break the type contract. Tests (28 new): catalog shape pinning + getAccelerator lookup + isPlatformCompatible matrix + readInstalled / readVersion / platformLabel / actionLabelFor branch coverage. Vitest harness stays at pure-function level - no React Testing Library yet, per the existing src/components/__tests__/ convention. CSS: .accelerator-card / -pill / -row variants in styles.css, matching the existing .torch-upgrade-pill colour vocabulary (rgba(80, 140, 220, ...) for the not-installed accent, rgba(80, 180, 100, ...) for installed, --border + --surface tokens for the chrome).

First end-to-end UX slice for FU-056. The Diagnostics tab gains a Boost Pack section listing all six CUDA-side accelerators (nunchaku, sageattention, dflash-mlx, dflash-cuda, triattention, kvpress) as a single scannable table. Status pill + Install / Retry button per row; click installs via the existing POST /api/setup/install-package endpoint, output captured into a collapsible details, then capabilities re-probe so the "Installed v1.2.1" pill flips without a parent refetch. Self-probes capabilities on mount via refreshCapabilities() so the panel works standalone — DiagnosticsPanel only passes backendOnline. Per-accelerator install state lives in a record keyed by pip name, so multiple installs can run concurrently if the user is impatient (the backend serialises pip writes at the OS-FS layer). Renders every catalog row with showIncompatible=true: this is the "see everything" surface, not a per-feature gate. Apple-Silicon and CUDA accelerators both list; the platform column tells the user which apply to their box, and disabled state + tooltip blocks an ill-fitting install. Phases 3-5 will filter per surface. Closes the first observable loop: Phase 1 probe → Phase 2 card (row variant) → install → re-probe → installed state. Same Component renders pill + card + row, so the per-feature surfaces in Phases 3-5 ride the same diff. No new tests — the pure logic (readInstalled, readVersion, actionLabelFor, platformLabel, isPlatformCompatible) is already pinned by Phase 2's 28 unit tests. The Boost Pack itself is wiring: fetch capabilities, dispatch install, re-fetch on success. Mirrors the existing CudaTorchLogPanel pattern.

Wires accelerator install affordances into the three Image surfaces users actually look at when picking + running a model: 1. Image Models tab — every installed FLUX / SD3.5 / Qwen-Image / SANA / PixArt row gets read-only pills next to the style tags: "🚀 SVDQuant 4-bit" + "🚀 Fast attention DiT" when the accelerator is missing, "✓ ..." when present. UNet pipelines (SD1.5 / SDXL) show no pills — neither nunchaku nor sageattention applies. 2. Image Discover tab — same pills on catalog variant cards in the same position. Lets users see acceleration potential before committing to a download. 3. Image Studio runtime banner — new "Performance boosters" section between the torch-upgrade pill and the model-load summary. Card variants of the same accelerators with full Install / Retry buttons. Self-contained install state: clicks POST /api/setup/install-package, capture the response capabilities, and overlay them onto the parent-provided snapshot so the card flips to "✓ Installed v..." without waiting for the next workspace refetch. The pills on the Models / Discover tabs are deliberately read-only — the install action lives in Studio's runtime banner so install state stays concentrated. A new optional onInstall prop on AcceleratorCard drives this: when omitted, the card renders as passive info. New helper getApplicableAccelerators(repo) maps a model repo to the accelerator IDs that apply. Pattern-matches on the family slug (FLUX.1, sd3.5, qwen-image, sana, pixart-sigma) so we don't have to edit catalog/image_models.py to land this — the catalog-side recommendedAccelerators metadata pattern is reserved for Phase 7 when the i18n + per-variant overrides land together. 7 new unit tests pin the matrix (FLUX, SD3.5, Qwen-Image, SANA, PixArt for nunchaku+sageattention; Wan / HunyuanVideo / LTX / CogVideoX / Mochi for sageattention-only; Wan2.1-T2V-1.3B for the triattention LongLive bonus; SDXL / SD1.5 return empty). NativeBackendStatus threads from App.tsx → ImageModelsTab, ImageDiscoverTab, ImageStudioTab → ImageStudioRuntimeBanner → ImageStudioBoosters. The prop is optional everywhere so older backends without FU-056 Phase 1 fields collapse pills to their "available" state rather than crashing the tab. Deferred to a follow-up commit: the post-generation suggestion toast (fires when a non-Nunchaku FLUX gen takes >12s on CUDA). The discovery + install surfaces in this commit already give users a clean path to install accelerators contextually; the toast adds a nudge but the install affordance is reachable without it.

Mirrors the Image-side wiring from Phase 3 onto the Video tabs: 1. Video Models tab - every Wan / HunyuanVideo / LTX / CogVideoX / Mochi row gets read-only accelerator pills next to the style tags. SageAttention applies to all CUDA video DiTs; TriAttention surfaces specifically on Wan 2.1 T2V 1.3B for the LongLive real-time long-clip mode. 2. Video Discover tab - same pills on catalog variant cards in the same chip-row position. 3. Video Studio runtime banner - new "Performance boosters" section between the torch-upgrade pill and the LongLive install row. Full card variants with working Install / Retry buttons + collapsible pip output. Implementation note: the booster section was identical to the image-side equivalent (same install state machine, same card rendering, same overlay-on-install-success pattern). Renamed ImageStudioBoosters -> MediaStudioBoosters and moved to src/components/ so both surfaces share one file. The component now takes a minimal {repo, name?} variant slice rather than a concrete ImageModelVariant / VideoModelVariant - both shapes carry those fields and the booster logic doesn't need anything else. One source of truth for the install / overlay / re-probe dance. NativeBackendStatus threads from App.tsx -> VideoDiscoverTab, VideoModelsTab, VideoStudioTab -> VideoStudioRuntimeBanner -> MediaStudioBoosters. Prop is optional everywhere so older backends without FU-056 Phase 1 fields collapse pills to their "available" state rather than crashing the tab. No new tests required - the getApplicableAccelerators repo-pattern matrix is already pinned by Phase 3's 7 tests, including all four relevant video repos (Wan2.1-T2V-1.3B with triattention bonus, Wan2.2-T2V-A14B without, HunyuanVideo, LTX-Video, CogVideoX, Mochi). MediaStudioBoosters internals match the previous ImageStudioBoosters, no behavioural changes.

Brings the in-app accelerator install affordance to the chat surface. When the user is chatting with a model that has a registered DFlash draft AND the appropriate pip package isn't installed yet, an unobtrusive nudge bar appears above the prompt textarea: DFlash speculative decoding can ~2x this model with no quality loss. [Install DFlash] Click installs the right package for the active backend (``dflash-mlx`` on Apple Silicon MLX, ``dflash`` on CUDA vLLM) via the existing ``handleInstallPackage`` dispatcher. The bar self-hides when the package lands and capabilities re-probe. Twin gating logic to the AcceleratorCard pattern: the hint only renders when all three signals line up (model in supportedModels, package missing for active backend, supported backend). The backend probe + ``resolveDflashSupport`` helper already exist from FU-034; this commit wires them into the composer. Drive-by fix in RuntimeControls.tsx: the existing "Install DFlash" button next to the launch-settings toggle hard-coded ``onInstallPackage("dflash-mlx")``, which silently installed the Apple-Silicon package on CUDA / Windows boxes running vLLM. Both the launch-settings button and the new composer hint now route through a shared ``dflashPackageFor(backend)`` helper that picks the right package per backend. 3 new unit tests pin the matrix (mlx -> dflash-mlx, vllm -> dflash, null / unknown -> dflash-mlx as safe default). Net change for the user: discover acceleration potential from the place where you generate (chat composer / studio runtime banner / catalog cards), not from a settings page you have to remember to visit.

vLLM ships no native Windows wheels; this commit lets Windows users install vLLM into an isolated WSL venv with one click. Three pieces: 1. **Detector** (backend_service/inference/accelerators.py): four new probes layered on top of the existing wsl2_available helper: - wsl_default_distro() reads "Default Distribution: Ubuntu-X" out of the UTF-16 ``wsl --status`` output - wsl_cuda_available() runs ``wsl -- nvidia-smi -L`` to confirm CUDA passthrough is working inside the distro - wsl_vllm_available() runs an ``import vllm`` inside the managed venv at ~/.chaosengine/vllm-venv - wsl_vllm_version() reads __version__ from the same venv Four matching fields on BackendCapabilities (wslDistroName, wslCudaAvailable, wslVllmAvailable, wslVllmVersion). The detail probes shell out via wsl.exe and can take a few seconds on a cold WSL service start, so they're gated behind a wsl2_active short-circuit — hosts without WSL pay zero subprocess cost. 2. **Install endpoint** (backend_service/routes/setup/vllm_wsl.py): POST /api/setup/install-vllm-wsl + /status. Background-thread job with five steps: - preflight (verify CUDA visible in WSL) - venv (python3 -m venv ~/.chaosengine/vllm-venv) - pip-upgrade (pip + setuptools + wheel) - pip-vllm (the long one, ~2 GB / 5-15 min) - verify (import vllm) Same single-job semantics as install-longlive: a second POST while running returns the running job state. The venv is rooted in the WSL user's $HOME (ext4-backed) so CUDA torch wheels don't pay the ~10x IO penalty of being on /mnt/c/. 3. **WslBridgePanel** (src/features/settings/WslBridgePanel.tsx): Windows-only Setup panel rendered alongside the Boost Pack on the Diagnostics tab. Four bucket states: - WSL2 not installed → ``wsl --install`` copy-paste hint + MS docs - WSL2 ready, no CUDA → NVIDIA WSL driver kicker link - WSL2 + CUDA ready, vLLM missing → one-click install button - vLLM ready → green pill with version + "Reinstall" affordance Self-probes capabilities on mount, polls install status at 1.5 Hz while a job is in flight, refreshes capabilities on completion so the bucket flips without a parent refetch. Uses the existing InstallLogPanel for log tail (extended to accept the new "vllm-wsl" variant). Tests: 12 new probe tests covering the present / absent / cold-host matrix for each WSL detail probe, plus 4 endpoint tests pinning the job-state shape + the Windows platform gate + the start/status contract. Live-verified on Windows + RTX 4090: detector returns ``distro=Ubuntu-24.04, cuda=True, vllm=False, version=None`` — correct for the dev box right now. Deferred to a follow-up commit: the actual engine routing so a vLLM model load transparently launches inside the WSL venv. This commit ships only the install path so users can stand up the venv today; the engine wiring needs careful path translation (/mnt/c/Users/... → Windows paths) and stdout streaming that deserves its own focused PR.

Completes the WSL bridge so Windows users get transparent vLLM inference. A model load with backend=vllm on Windows + wslVllm installed transparently spawns the OpenAI-compatible server inside the WSL Ubuntu venv and proxies /v1/chat/completions through it. No user action beyond clicking "Install vLLM in WSL" once. Three pieces: 1. **VllmWslEngine** (backend_service/inference/vllm_wsl_engine.py): HTTP-bridge engine modelled on MtplxEngine. Subprocess shape: wsl -- ~/.chaosengine/vllm-venv/bin/python -m vllm.entrypoints.openai.api_server --model <ref> --host 127.0.0.1 --port <free> --max-model-len <ctx> --trust-remote-code WSL2 mirrors loopback to the Windows host so the Windows backend reaches the listener at 127.0.0.1:<port> without any port-forward ceremony. Implements both generate() and stream_generate() so the existing chat surface stream path works end to end. 2. **windows_path_to_wsl helper**: a local model at C:\Users\Dan\AI_Models\Qwen3-7B gets translated to /mnt/c/Users/Dan/AI_Models/Qwen3-7B before being passed to vLLM, so a Windows-side download is reachable from inside WSL. HF repo ids (Qwen/Qwen3.5-7B) pass through unchanged - vLLM downloads them into its WSL-native HF cache, which avoids the ~10x IO penalty of /mnt/c-based cache reads. 3. **Routing** (backend_service/inference/controller.py): when ``hint == "vllm"`` the controller now prefers VllmWslEngine on Windows + wslVllmAvailable=True, falling through to the in-process VLLMEngine on Linux. On Windows boxes without the bridge, the error message points the user at Diagnostics → WSL2 vLLM bridge instead of the bare "pip install vllm" hint that doesn't work on Windows. Speculative decoding via the WSL bridge isn't wired yet - the in-process VLLMEngine uses vllm.LLM's speculative_config= kwarg, but the OpenAI server entry-point uses --speculative-model / --num-speculative-tokens which need separate wiring. The runtime note honestly flags the gap rather than silently dropping requests. Tests: 13 new in test_vllm_wsl_engine.py covering: - windows_path_to_wsl matrix (backslash, forward-slash, drive casing, WSL passthrough, repo-id passthrough, UNC, relative) - load_model platform gate (off-Windows rejects) - load_model capability gate (wslVllm missing rejects) - argv composition (every required vllm flag present + ordered) - happy-path lifecycle (Popen called once, /health polled, LoadedModelInfo populated correctly, pid reachable) - path translation on a Windows model path

Caught during a live end-to-end test on a fresh Ubuntu 24.04 WSL install: ``python3 -m venv ~/.chaosengine/vllm-venv`` fails with ``ensurepip is not available`` because Ubuntu 24.04 ships python3 without the venv module. Before this commit the user would see a confusing error mid-install ("Failed to create the WSL venv. See output above.") with the real fix buried in stderr. Now the preflight step explicitly probes ``python3 -c 'import ensurepip'`` after the CUDA check. When it fails, the install endpoint surfaces the exact apt command: sudo apt update && sudo apt install -y python3-venv instead of trying to create the venv and erroring out. Same pattern as the existing NVIDIA-driver-not-found path: tell the user what to do, don't pretend to recover.

…se 8) End-to-end test validated against real CUDA + real vLLM 0.21.0 + real WSL2 Ubuntu-24.04 on Windows + RTX 4090. Loaded Qwen2.5-0.5B-Instruct in 96 s and generated "Paris." for the prompt "The capital of France is" — 1.19 s HTTP round-trip from the Windows backend into WSL and back. Four fixes the live test surfaced, none of which would have been caught by mocked unit tests: 1. **PATH plumbing through grandchild processes**: the engine subprocess inside vLLM (EngineCore) couldn't find ``ninja`` for flashinfer's JIT-compiled sampling kernels, even though it lived in the venv's bin/. The command builder now wraps the python invocation in ``bash -c`` so we can prepend ``~/.chaosengine/vllm-venv/bin`` to PATH explicitly. The PATH value is double-quoted because WSL2 interopts the Windows PATH into bash, and that PATH contains paths with spaces (``/mnt/c/Program Files/NVIDIA…``) which otherwise word-split into ``export: 'Files/NVIDIA': not a valid identifier`` errors. 2. **vLLM 0.21+ flashinfer JIT escape hatches**: even with ninja reachable, flashinfer needs ``nvcc`` for the second compile stage. Setting ``VLLM_USE_FLASHINFER_SAMPLER=0`` + ``VLLM_ATTENTION_BACKEND=TORCH_SDPA`` routes through pre-built PyTorch kernels. ``--enforce-eager`` disables CUDA-graph compilation. Loses some perf but avoids the second JIT. 3. **/v1/models probe instead of /health**: vLLM's ``/health`` returns 200 with an empty body, which tripped ``_http_json``'s ``json.loads`` and made ``_wait_for_server`` retry indefinitely until the timeout. ``/v1/models`` returns the loaded-model list as JSON so the parse succeeds and we return on first OK. 4. **shlex-quoted model arg**: a model path with spaces (e.g. a Windows-translated ``/mnt/c/My Models/Qwen3-7B``) would word-split through the bash -c parse without quoting. New test pins the round-trip. Plus the install endpoint's preflight already grew a clear "sudo apt install python3-venv" message (last commit) — caught the same way, just earlier in the chain. New file ``scripts/live_e2e_vllm_wsl.py`` — not part of the regular test suite; one-shot script that probes capabilities, constructs the engine, loads a tiny chat-tuned model (Qwen/Qwen2.5-0.5B-Instruct), generates a deterministic prompt, prints metrics, tears down. Run from Windows + WSL with vllm-venv installed: ``.venv\Scripts\python.exe scripts\live_e2e_vllm_wsl.py``. Exit 0 on success, 1 with full traceback on failure. Tests: 15 in test_vllm_wsl_engine.py still pass (3 lifecycle + 3 command-shape + 5 path-translation + 2 platform-gate + 2 capability-gate). All 42 in the wider WSL-bridge test files green. Live-test run output: Loaded in 96.3s engine: vllm-wsl runtimeNote: vLLM 0.21.0 running inside WSL (Ubuntu-24.04). pid: 34036 port: 58586 text: 'Paris.' finishReason: stop promptTokens: 34 completionTokens: 3 responseSeconds: 1.19

Closes the test-coverage gap on everything FU-056 has shipped over the previous eight commits. Three small additions across the existing test-gate scripts: 1. **scripts/cache-strategy-matrix.py** — capability probe now considers the WSL vLLM bridge a valid vllm provider. Without this, all four vllm matrix cells would skip with "vLLM not installed (CUDA-only)" on Windows boxes even though the bridge route works (validated by the live e2e in commit c4f3701). New ``wsl_vllm_available`` field on BackendCapabilities; the skip-reason copy now names both routes so a user reading a skip-row knows their actionable next step regardless of OS. 2. **scripts/pre-build-check.mjs [5/8]** — extended with a new sub-probe that walks ``src/components/acceleratorCatalog.ts`` for every (pipPackage, capabilityField) pair and asserts each one exists in (a) the backend's _INSTALLABLE_PIP_PACKAGES allow-list and (b) the BackendCapabilities dataclass. Surface: ``PASS Accelerator catalog ↔ backend (6 entries)``. Catches drift: adding a 7th catalog row without wiring its pip package + capability flag would fail the gate at build time rather than at first user click. Six entries today (nunchaku, sageattention, dflash-mlx, dflash-cuda, triattention, kvpress). 3. **scripts/e2e_test_suite.py phase 6** — two new read-only probes alongside the existing 7: - ``vllm-wsl-status``: GETs /api/setup/install-vllm-wsl/status and asserts the JSON shape (phase + done fields present). Verifies the Phase 8 install endpoint at minimum returns the expected schema even when no install has been started. - ``fu-056-capability-flags``: GETs /api/health and asserts all 7 FU-056 Phase 1 capability fields are present on ``nativeBackends``. The fields are optional in the schema (older backends shouldn't crash the frontend), but the gate ensures release builds expose them. Phase 6 grows from 7 to 9 checks. Verified live against the user's running backend: PASS 9/9. No new test files. Phase 9 is gate plumbing on existing scripts.

Caught during the live WSL test sweep: ``backend_service.app.main()`` hard-coded ``port=DEFAULT_PORT`` in the ``uvicorn.run`` call and ignored the ``--port`` flag the test scripts have been passing. Worked historically because DEFAULT_PORT already reads ``CHAOSENGINE_PORT`` env, so test runs that set the env var got the right port — but ``python -m backend_service.app --port 8877`` silently bound 8876. Now ``main()`` uses argparse with env-var fallbacks: --port → $CHAOSENGINE_PORT → 8876 --host → $CHAOSENGINE_HOST → 127.0.0.1 CLI > env > default. Surfaces ``--help`` properly (the user can discover the args). The existing env-var path keeps working for the Tauri shell + headless test scripts that already set ``CHAOSENGINE_PORT``. Three new helper scripts under ``scripts/`` for the WSL dev workflow: - ``install_llama_server_wsl.sh`` — downloads the latest llama.cpp Linux release into ``~/.chaosengine/bin/`` for the WSL backend. - ``run_backend_wsl.sh`` — launches the backend on port 8877 with auth disabled (env: ``CHAOSENGINE_REQUIRE_AUTH=0``), pointing at the WSL-side llama-server. Detached via nohup + disown. - ``probe_backend_wsl.sh`` — diagnostic helper; runs the backend foreground for 3 s and surfaces import / bind errors. WSL test sweep results (Ubuntu-24.04, RTX 4090, vllm-venv at 0.21.0): - pytest tests/ — 1472 passed, 21 failed, 21 skipped (49 more passes than Windows — fewer platform-specific failures) - e2e_test_suite.py --smoke — 6/0/0 PASS including the two new FU-056 Phase 9 phase-6 probes (vllm-wsl-status + capability flags) - cache-strategy-matrix.py --quick — 0/0 ran, 15/15 skipped honestly (only ``native`` strategy in dev venv; no turbo binary, no dflash, no models in dev library — all skip reasons accurate)

Two related UX cleanups landed together because they share the same plumbing pattern (App.tsx → tabs → leaf components): 1. **Hide MTPLX install affordances on Windows / Linux.** The MTPLX block in RuntimeControls (the launch-settings modal that opens from Chat / Compare / HTML Challenge / Benchmarks) used to render the MTPLX checkbox + "Install MTPLX" button + info disclosure on every host. MTPLX is Apple-Silicon-only — the install would error on Windows and the checkbox would render disabled with no path to recovery. Per the FU-034 rule (hide unrecoverable options, don't grey them out), the whole block is now gated on a new ``isAppleSilicon`` prop threaded from App.tsx via: App.tsx → LaunchModal / CompareView / HtmlChallengeTab / BenchmarkRunTab → ChallengePickerModal (for HtmlChallengeTab) → ModelLaunchModal → RuntimeControls Three call sites on RuntimeControls (the MTPLX label, the info-panel expand, the info button) now ALL gate on the prop. ``dflash-mlx`` was already platform-gated via the FU-056 Phase 2 AcceleratorCard catalog (platformGate: "apple-silicon"). 2. **Chat empty-state banner.** Fresh-install users opening the Chat tab used to see "Send a message to start the conversation." followed by a silent auto-load of the largest MLX direct variant (a 15+ GB download that doesn't even work on Windows/Linux — MLX backend doesn't exist there). Replaced with a ``<ChatEmptyStateBanner>`` that surfaces a clear CTA: "Browse Discover" when library is empty, "Open Models" when models are present but none loaded. No silent auto-loads, no confused users waiting on the wrong download. The banner is purely additive — composer textarea still usable above (users can also type + the banner suggests Discover). Plumbing: - New ``src/utils/platform.ts`` with ``isAppleSiliconHost``, ``isCudaHost``, ``isIntelMac`` helpers. Reads from ``workspace.system`` (platform + arch) which the backend already populates from ``platform.system()`` + ``platform.machine()``. - 15 unit tests in ``src/utils/__tests__/platform.test.ts`` pin every host-classifier branch (Darwin arm64, aarch64, Intel Mac, Windows, Linux, null/undefined, case-insensitive). - ``isAppleSiliconHost(workspace.system)`` computed once at App.tsx top-level, threaded as ``isAppleSilicon`` prop to the four call sites that own MTPLX surfaces. - New ``<ChatEmptyStateBanner>`` component with two states (no-models / no-loaded-model), each with appropriate CTA. Tests: 35 files / 424 vitest tests pass (+15 from platform helper). tsc clean. No new pytest needed — backend unchanged. Not addressed in this commit (deferred): - MLX-only image / video catalog variants still surface in Discover / Models tabs on Win/Linux. Filtering those is a larger UX call — hide entirely vs. show with "Apple Silicon only" pill — deserves its own decision before code. - "llama-server installed by default" — already the case via scripts/stage-runtime.mjs for release builds. No code change.

Per FU-034 "hide unrecoverable options" policy, extend it to whole catalog rows. Windows / Linux users no longer see MLX / mlx-video / mflux / MTPLX entries they can never run, and Apple Silicon users no longer see vLLM / nunchaku / CUDA-only entries. - src/utils/platform.ts: imageOrVideoVariantPlatformGate + chatVariantPlatformGate + isVariantCompatibleWithHost derive a PlatformGate ("apple-silicon" | "cuda" | "any") from existing variant fields (runtime / backend / styleTags / repo prefix). No catalog schema change required. - ImageModelsTab / ImageDiscoverTab / VideoModelsTab / VideoDiscoverTab: new hostSystem prop, filtered through isVariantCompatibleWithHost in the rows/filteredResults useMemo. - App.tsx: threaded workspace.system into all four tabs; libraryChatOptions now also filtered so the launch dropdown drops MLX backends on Win/Linux. - AcceleratorsBoostPack: showIncompatible flipped off, the table now surfaces only accelerators the current host can install. 16 new vitest cases pin the helper boundaries (Apple Silicon host hides CUDA-only variants, Linux x86_64 hides Apple-Silicon-only variants, "any" gate passes on every host, etc). All 440 frontend tests pass; tsc clean.

cryptopoly added 15 commits May 17, 2026 10:47

cryptopoly merged commit 4fda709 into staging May 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/accelerator install ux#58

Feature/accelerator install ux#58
cryptopoly merged 15 commits into
stagingfrom
feature/accelerator-install-ux

cryptopoly commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cryptopoly commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant