Feature/accelerator install ux#58
Merged
Merged
Conversation
Foundation for in-app install UX. Lazy importability + version probes for nunchaku / sageattention / dflash-mlx / dflash-cuda / triattention / kvpress, plus a Windows-only wsl2 detector that seeds the upcoming vLLM-via-WSL bridge. Eleven new fields on BackendCapabilities surface through /api/health; the placeholder probe primes them on first paint so the UI never flashes Install for a package that is actually present. Probes resilient to the half-baked-install failure mode we hit on Windows (torch directory present but Python source missing): find_spec swallows ValueError, version reads swallow ImportError and missing __version__. DFlash MLX vs CUDA flags delegate to the existing dflash.is_mlx_available / dflash.is_vllm_available helpers so the upstream package-layout dance stays in one place. Tests: 25 in tests/test_accelerator_capabilities.py covering present / absent / broken-install / WSL-status branches.
Tests should exercise the same install users have, not a parallel
.venv install. New tests/conftest.py calls ensure_extras_on_sys_path
at collection time, so pytest tests/ resolves torch / diffusers /
mlx / nunchaku / sageattention / triattention / vllm against the
persistent extras dir at:
Windows: %LOCALAPPDATA%\ChaosEngineAI\extras\cp{XY}\site-packages
macOS: ~/Library/Application Support/ChaosEngineAI/extras/cp{XY}/site-packages
Linux: ${XDG_DATA_HOME}/ChaosEngineAI/extras/cp{XY}/site-packages
A torch upgrade landing via the in-app installer is reflected in the
next pytest run automatically; no pip install dance in .venv. On a
fresh CI box without the extras dir the conftest is a silent no-op,
so existing test boxes keep working.
Set CHAOSENGINE_TEST_TRACE_EXTRAS=1 to log which extras path got
loaded for a given run.
Runners (e2e_test_suite.py, cache-strategy-matrix.py) now print an
actionable hint when the backend is not reachable: open the
ChaosEngineAI app, rather than just backend not reachable; aborting.
Both still exit 2/3 respectively so CI gates stay reliable.
Docs (testing/overview.md, testing/e2e-testing.md) updated with the
canonical open-the-app-then-run-tests flow, with the headless dev
backend kept as an advanced option for contributors.
Reusable card for the six CUDA-side accelerators (nunchaku,
sageattention, dflash-mlx, dflash-cuda, triattention, kvpress).
Three placement variants share one component so the per-feature
surfaces in Phases 3-6 stay in sync without re-implementing the
three states (idle / installing / installed / failed) per surface:
- card: full banner with title, claim, applies-to, size pill,
primary action. Lands in the Image / Video Studio runtime
banners and the Diagnostics Boost Pack.
- pill: compact horizontal chip with 4-bit-style copy. Lands on
catalog variant cards in the Discover / Models tabs.
- row: table form for Diagnostics Boost Pack's scannable view.
State ownership: parent owns the install lifecycle (which package
is in flight, success/failure, captured pip output). The card
only owns the log-expanded toggle. Mirrors the CudaTorchLogPanel
contract so the card is cheap to render in many places without
duplicating polling work.
New catalog (src/components/acceleratorCatalog.ts) is the single
source of truth for each accelerator's pip name, capability flag,
speedup claim, size, install mode, and platform gate. Adding a
seventh accelerator is one entry here, one Phase 1 capability
flag, and one row in the backend's _INSTALLABLE_PIP_PACKAGES.
NativeBackendStatus (src/types/server.ts) extended with the 13
FU-056 Phase 1 fields plus the older vllm/mtplx/ggufMtp fields
that were already on the wire but missing from the TS interface.
All fields optional so a backend running an older build than the
frontend doesn't break the type contract.
Tests (28 new): catalog shape pinning + getAccelerator lookup +
isPlatformCompatible matrix + readInstalled / readVersion /
platformLabel / actionLabelFor branch coverage. Vitest harness
stays at pure-function level - no React Testing Library yet, per
the existing src/components/__tests__/ convention.
CSS: .accelerator-card / -pill / -row variants in styles.css,
matching the existing .torch-upgrade-pill colour vocabulary
(rgba(80, 140, 220, ...) for the not-installed accent,
rgba(80, 180, 100, ...) for installed, --border + --surface
tokens for the chrome).
First end-to-end UX slice for FU-056. The Diagnostics tab gains a Boost Pack section listing all six CUDA-side accelerators (nunchaku, sageattention, dflash-mlx, dflash-cuda, triattention, kvpress) as a single scannable table. Status pill + Install / Retry button per row; click installs via the existing POST /api/setup/install-package endpoint, output captured into a collapsible details, then capabilities re-probe so the "Installed v1.2.1" pill flips without a parent refetch. Self-probes capabilities on mount via refreshCapabilities() so the panel works standalone — DiagnosticsPanel only passes backendOnline. Per-accelerator install state lives in a record keyed by pip name, so multiple installs can run concurrently if the user is impatient (the backend serialises pip writes at the OS-FS layer). Renders every catalog row with showIncompatible=true: this is the "see everything" surface, not a per-feature gate. Apple-Silicon and CUDA accelerators both list; the platform column tells the user which apply to their box, and disabled state + tooltip blocks an ill-fitting install. Phases 3-5 will filter per surface. Closes the first observable loop: Phase 1 probe → Phase 2 card (row variant) → install → re-probe → installed state. Same Component renders pill + card + row, so the per-feature surfaces in Phases 3-5 ride the same diff. No new tests — the pure logic (readInstalled, readVersion, actionLabelFor, platformLabel, isPlatformCompatible) is already pinned by Phase 2's 28 unit tests. The Boost Pack itself is wiring: fetch capabilities, dispatch install, re-fetch on success. Mirrors the existing CudaTorchLogPanel pattern.
Wires accelerator install affordances into the three Image surfaces
users actually look at when picking + running a model:
1. Image Models tab — every installed FLUX / SD3.5 / Qwen-Image /
SANA / PixArt row gets read-only pills next to the style tags:
"🚀 SVDQuant 4-bit" + "🚀 Fast attention DiT" when the
accelerator is missing, "✓ ..." when present. UNet pipelines
(SD1.5 / SDXL) show no pills — neither nunchaku nor
sageattention applies.
2. Image Discover tab — same pills on catalog variant cards in
the same position. Lets users see acceleration potential
before committing to a download.
3. Image Studio runtime banner — new "Performance boosters"
section between the torch-upgrade pill and the model-load
summary. Card variants of the same accelerators with full
Install / Retry buttons. Self-contained install state: clicks
POST /api/setup/install-package, capture the response
capabilities, and overlay them onto the parent-provided
snapshot so the card flips to "✓ Installed v..." without
waiting for the next workspace refetch.
The pills on the Models / Discover tabs are deliberately
read-only — the install action lives in Studio's runtime banner so
install state stays concentrated. A new optional onInstall prop on
AcceleratorCard drives this: when omitted, the card renders as
passive info.
New helper getApplicableAccelerators(repo) maps a model repo to the
accelerator IDs that apply. Pattern-matches on the family slug
(FLUX.1, sd3.5, qwen-image, sana, pixart-sigma) so we don't have to
edit catalog/image_models.py to land this — the catalog-side
recommendedAccelerators metadata pattern is reserved for Phase 7
when the i18n + per-variant overrides land together. 7 new unit
tests pin the matrix (FLUX, SD3.5, Qwen-Image, SANA, PixArt for
nunchaku+sageattention; Wan / HunyuanVideo / LTX / CogVideoX /
Mochi for sageattention-only; Wan2.1-T2V-1.3B for the triattention
LongLive bonus; SDXL / SD1.5 return empty).
NativeBackendStatus threads from App.tsx → ImageModelsTab,
ImageDiscoverTab, ImageStudioTab → ImageStudioRuntimeBanner →
ImageStudioBoosters. The prop is optional everywhere so older
backends without FU-056 Phase 1 fields collapse pills to their
"available" state rather than crashing the tab.
Deferred to a follow-up commit: the post-generation suggestion
toast (fires when a non-Nunchaku FLUX gen takes >12s on CUDA). The
discovery + install surfaces in this commit already give users a
clean path to install accelerators contextually; the toast adds a
nudge but the install affordance is reachable without it.
Mirrors the Image-side wiring from Phase 3 onto the Video tabs:
1. Video Models tab - every Wan / HunyuanVideo / LTX / CogVideoX /
Mochi row gets read-only accelerator pills next to the style
tags. SageAttention applies to all CUDA video DiTs;
TriAttention surfaces specifically on Wan 2.1 T2V 1.3B for the
LongLive real-time long-clip mode.
2. Video Discover tab - same pills on catalog variant cards in
the same chip-row position.
3. Video Studio runtime banner - new "Performance boosters"
section between the torch-upgrade pill and the LongLive
install row. Full card variants with working Install / Retry
buttons + collapsible pip output.
Implementation note: the booster section was identical to the
image-side equivalent (same install state machine, same card
rendering, same overlay-on-install-success pattern). Renamed
ImageStudioBoosters -> MediaStudioBoosters and moved to
src/components/ so both surfaces share one file. The component
now takes a minimal {repo, name?} variant slice rather than a
concrete ImageModelVariant / VideoModelVariant - both shapes
carry those fields and the booster logic doesn't need anything
else. One source of truth for the install / overlay / re-probe
dance.
NativeBackendStatus threads from App.tsx -> VideoDiscoverTab,
VideoModelsTab, VideoStudioTab -> VideoStudioRuntimeBanner ->
MediaStudioBoosters. Prop is optional everywhere so older
backends without FU-056 Phase 1 fields collapse pills to their
"available" state rather than crashing the tab.
No new tests required - the getApplicableAccelerators repo-pattern
matrix is already pinned by Phase 3's 7 tests, including all four
relevant video repos (Wan2.1-T2V-1.3B with triattention bonus,
Wan2.2-T2V-A14B without, HunyuanVideo, LTX-Video, CogVideoX,
Mochi). MediaStudioBoosters internals match the previous
ImageStudioBoosters, no behavioural changes.
Brings the in-app accelerator install affordance to the chat
surface. When the user is chatting with a model that has a
registered DFlash draft AND the appropriate pip package isn't
installed yet, an unobtrusive nudge bar appears above the prompt
textarea:
DFlash speculative decoding can ~2x this model with no quality
loss. [Install DFlash]
Click installs the right package for the active backend
(``dflash-mlx`` on Apple Silicon MLX, ``dflash`` on CUDA vLLM)
via the existing ``handleInstallPackage`` dispatcher. The bar
self-hides when the package lands and capabilities re-probe.
Twin gating logic to the AcceleratorCard pattern: the hint only
renders when all three signals line up (model in supportedModels,
package missing for active backend, supported backend). The
backend probe + ``resolveDflashSupport`` helper already exist
from FU-034; this commit wires them into the composer.
Drive-by fix in RuntimeControls.tsx: the existing "Install DFlash"
button next to the launch-settings toggle hard-coded
``onInstallPackage("dflash-mlx")``, which silently installed the
Apple-Silicon package on CUDA / Windows boxes running vLLM. Both
the launch-settings button and the new composer hint now route
through a shared ``dflashPackageFor(backend)`` helper that picks
the right package per backend. 3 new unit tests pin the matrix
(mlx -> dflash-mlx, vllm -> dflash, null / unknown -> dflash-mlx
as safe default).
Net change for the user: discover acceleration potential from
the place where you generate (chat composer / studio runtime
banner / catalog cards), not from a settings page you have to
remember to visit.
vLLM ships no native Windows wheels; this commit lets Windows users
install vLLM into an isolated WSL venv with one click. Three pieces:
1. **Detector** (backend_service/inference/accelerators.py):
four new probes layered on top of the existing wsl2_available
helper:
- wsl_default_distro() reads "Default Distribution: Ubuntu-X" out
of the UTF-16 ``wsl --status`` output
- wsl_cuda_available() runs ``wsl -- nvidia-smi -L`` to confirm
CUDA passthrough is working inside the distro
- wsl_vllm_available() runs an ``import vllm`` inside the managed
venv at ~/.chaosengine/vllm-venv
- wsl_vllm_version() reads __version__ from the same venv
Four matching fields on BackendCapabilities (wslDistroName,
wslCudaAvailable, wslVllmAvailable, wslVllmVersion). The detail
probes shell out via wsl.exe and can take a few seconds on a
cold WSL service start, so they're gated behind a wsl2_active
short-circuit — hosts without WSL pay zero subprocess cost.
2. **Install endpoint** (backend_service/routes/setup/vllm_wsl.py):
POST /api/setup/install-vllm-wsl + /status. Background-thread job
with five steps:
- preflight (verify CUDA visible in WSL)
- venv (python3 -m venv ~/.chaosengine/vllm-venv)
- pip-upgrade (pip + setuptools + wheel)
- pip-vllm (the long one, ~2 GB / 5-15 min)
- verify (import vllm)
Same single-job semantics as install-longlive: a second POST
while running returns the running job state. The venv is rooted
in the WSL user's $HOME (ext4-backed) so CUDA torch wheels don't
pay the ~10x IO penalty of being on /mnt/c/.
3. **WslBridgePanel** (src/features/settings/WslBridgePanel.tsx):
Windows-only Setup panel rendered alongside the Boost Pack on
the Diagnostics tab. Four bucket states:
- WSL2 not installed → ``wsl --install`` copy-paste hint + MS docs
- WSL2 ready, no CUDA → NVIDIA WSL driver kicker link
- WSL2 + CUDA ready, vLLM missing → one-click install button
- vLLM ready → green pill with version + "Reinstall" affordance
Self-probes capabilities on mount, polls install status at 1.5 Hz
while a job is in flight, refreshes capabilities on completion so
the bucket flips without a parent refetch. Uses the existing
InstallLogPanel for log tail (extended to accept the new
"vllm-wsl" variant).
Tests: 12 new probe tests covering the present / absent / cold-host
matrix for each WSL detail probe, plus 4 endpoint tests pinning the
job-state shape + the Windows platform gate + the start/status
contract. Live-verified on Windows + RTX 4090: detector returns
``distro=Ubuntu-24.04, cuda=True, vllm=False, version=None`` —
correct for the dev box right now.
Deferred to a follow-up commit: the actual engine routing so a
vLLM model load transparently launches inside the WSL venv. This
commit ships only the install path so users can stand up the venv
today; the engine wiring needs careful path translation
(/mnt/c/Users/... → Windows paths) and stdout streaming that
deserves its own focused PR.
Completes the WSL bridge so Windows users get transparent vLLM
inference. A model load with backend=vllm on Windows + wslVllm
installed transparently spawns the OpenAI-compatible server inside
the WSL Ubuntu venv and proxies /v1/chat/completions through it.
No user action beyond clicking "Install vLLM in WSL" once.
Three pieces:
1. **VllmWslEngine** (backend_service/inference/vllm_wsl_engine.py):
HTTP-bridge engine modelled on MtplxEngine. Subprocess shape:
wsl -- ~/.chaosengine/vllm-venv/bin/python
-m vllm.entrypoints.openai.api_server
--model <ref> --host 127.0.0.1 --port <free>
--max-model-len <ctx> --trust-remote-code
WSL2 mirrors loopback to the Windows host so the Windows backend
reaches the listener at 127.0.0.1:<port> without any port-forward
ceremony. Implements both generate() and stream_generate() so the
existing chat surface stream path works end to end.
2. **windows_path_to_wsl helper**: a local model at
C:\Users\Dan\AI_Models\Qwen3-7B gets translated to
/mnt/c/Users/Dan/AI_Models/Qwen3-7B before being passed to vLLM,
so a Windows-side download is reachable from inside WSL. HF repo
ids (Qwen/Qwen3.5-7B) pass through unchanged - vLLM downloads them
into its WSL-native HF cache, which avoids the ~10x IO penalty
of /mnt/c-based cache reads.
3. **Routing** (backend_service/inference/controller.py): when
``hint == "vllm"`` the controller now prefers VllmWslEngine on
Windows + wslVllmAvailable=True, falling through to the in-process
VLLMEngine on Linux. On Windows boxes without the bridge, the
error message points the user at Diagnostics → WSL2 vLLM bridge
instead of the bare "pip install vllm" hint that doesn't work on
Windows.
Speculative decoding via the WSL bridge isn't wired yet - the
in-process VLLMEngine uses vllm.LLM's speculative_config= kwarg, but
the OpenAI server entry-point uses --speculative-model /
--num-speculative-tokens which need separate wiring. The runtime
note honestly flags the gap rather than silently dropping requests.
Tests: 13 new in test_vllm_wsl_engine.py covering:
- windows_path_to_wsl matrix (backslash, forward-slash, drive
casing, WSL passthrough, repo-id passthrough, UNC, relative)
- load_model platform gate (off-Windows rejects)
- load_model capability gate (wslVllm missing rejects)
- argv composition (every required vllm flag present + ordered)
- happy-path lifecycle (Popen called once, /health polled,
LoadedModelInfo populated correctly, pid reachable)
- path translation on a Windows model path
Caught during a live end-to-end test on a fresh Ubuntu 24.04 WSL
install: ``python3 -m venv ~/.chaosengine/vllm-venv`` fails with
``ensurepip is not available`` because Ubuntu 24.04 ships python3
without the venv module. Before this commit the user would see a
confusing error mid-install ("Failed to create the WSL venv. See
output above.") with the real fix buried in stderr.
Now the preflight step explicitly probes ``python3 -c 'import
ensurepip'`` after the CUDA check. When it fails, the install
endpoint surfaces the exact apt command:
sudo apt update && sudo apt install -y python3-venv
instead of trying to create the venv and erroring out. Same
pattern as the existing NVIDIA-driver-not-found path: tell the
user what to do, don't pretend to recover.
…se 8)
End-to-end test validated against real CUDA + real vLLM 0.21.0 +
real WSL2 Ubuntu-24.04 on Windows + RTX 4090. Loaded
Qwen2.5-0.5B-Instruct in 96 s and generated "Paris." for the
prompt "The capital of France is" — 1.19 s HTTP round-trip from
the Windows backend into WSL and back.
Four fixes the live test surfaced, none of which would have been
caught by mocked unit tests:
1. **PATH plumbing through grandchild processes**: the engine
subprocess inside vLLM (EngineCore) couldn't find ``ninja`` for
flashinfer's JIT-compiled sampling kernels, even though it lived
in the venv's bin/. The command builder now wraps the python
invocation in ``bash -c`` so we can prepend
``~/.chaosengine/vllm-venv/bin`` to PATH explicitly. The PATH
value is double-quoted because WSL2 interopts the Windows PATH
into bash, and that PATH contains paths with spaces
(``/mnt/c/Program Files/NVIDIA…``) which otherwise word-split
into ``export: 'Files/NVIDIA': not a valid identifier`` errors.
2. **vLLM 0.21+ flashinfer JIT escape hatches**: even with ninja
reachable, flashinfer needs ``nvcc`` for the second compile
stage. Setting ``VLLM_USE_FLASHINFER_SAMPLER=0`` +
``VLLM_ATTENTION_BACKEND=TORCH_SDPA`` routes through pre-built
PyTorch kernels. ``--enforce-eager`` disables CUDA-graph
compilation. Loses some perf but avoids the second JIT.
3. **/v1/models probe instead of /health**: vLLM's ``/health``
returns 200 with an empty body, which tripped ``_http_json``'s
``json.loads`` and made ``_wait_for_server`` retry indefinitely
until the timeout. ``/v1/models`` returns the loaded-model list
as JSON so the parse succeeds and we return on first OK.
4. **shlex-quoted model arg**: a model path with spaces (e.g. a
Windows-translated ``/mnt/c/My Models/Qwen3-7B``) would
word-split through the bash -c parse without quoting. New test
pins the round-trip.
Plus the install endpoint's preflight already grew a clear
"sudo apt install python3-venv" message (last commit) — caught the
same way, just earlier in the chain.
New file ``scripts/live_e2e_vllm_wsl.py`` — not part of the
regular test suite; one-shot script that probes capabilities,
constructs the engine, loads a tiny chat-tuned model
(Qwen/Qwen2.5-0.5B-Instruct), generates a deterministic prompt,
prints metrics, tears down. Run from Windows + WSL with
vllm-venv installed: ``.venv\Scripts\python.exe scripts\live_e2e_vllm_wsl.py``.
Exit 0 on success, 1 with full traceback on failure.
Tests: 15 in test_vllm_wsl_engine.py still pass (3 lifecycle +
3 command-shape + 5 path-translation + 2 platform-gate +
2 capability-gate). All 42 in the wider WSL-bridge test files green.
Live-test run output:
Loaded in 96.3s
engine: vllm-wsl
runtimeNote: vLLM 0.21.0 running inside WSL (Ubuntu-24.04).
pid: 34036
port: 58586
text: 'Paris.'
finishReason: stop
promptTokens: 34
completionTokens: 3
responseSeconds: 1.19
Closes the test-coverage gap on everything FU-056 has shipped over the previous eight commits. Three small additions across the existing test-gate scripts: 1. **scripts/cache-strategy-matrix.py** — capability probe now considers the WSL vLLM bridge a valid vllm provider. Without this, all four vllm matrix cells would skip with "vLLM not installed (CUDA-only)" on Windows boxes even though the bridge route works (validated by the live e2e in commit c4f3701). New ``wsl_vllm_available`` field on BackendCapabilities; the skip-reason copy now names both routes so a user reading a skip-row knows their actionable next step regardless of OS. 2. **scripts/pre-build-check.mjs [5/8]** — extended with a new sub-probe that walks ``src/components/acceleratorCatalog.ts`` for every (pipPackage, capabilityField) pair and asserts each one exists in (a) the backend's _INSTALLABLE_PIP_PACKAGES allow-list and (b) the BackendCapabilities dataclass. Surface: ``PASS Accelerator catalog ↔ backend (6 entries)``. Catches drift: adding a 7th catalog row without wiring its pip package + capability flag would fail the gate at build time rather than at first user click. Six entries today (nunchaku, sageattention, dflash-mlx, dflash-cuda, triattention, kvpress). 3. **scripts/e2e_test_suite.py phase 6** — two new read-only probes alongside the existing 7: - ``vllm-wsl-status``: GETs /api/setup/install-vllm-wsl/status and asserts the JSON shape (phase + done fields present). Verifies the Phase 8 install endpoint at minimum returns the expected schema even when no install has been started. - ``fu-056-capability-flags``: GETs /api/health and asserts all 7 FU-056 Phase 1 capability fields are present on ``nativeBackends``. The fields are optional in the schema (older backends shouldn't crash the frontend), but the gate ensures release builds expose them. Phase 6 grows from 7 to 9 checks. Verified live against the user's running backend: PASS 9/9. No new test files. Phase 9 is gate plumbing on existing scripts.
Caught during the live WSL test sweep: ``backend_service.app.main()``
hard-coded ``port=DEFAULT_PORT`` in the ``uvicorn.run`` call and ignored
the ``--port`` flag the test scripts have been passing. Worked
historically because DEFAULT_PORT already reads ``CHAOSENGINE_PORT``
env, so test runs that set the env var got the right port — but
``python -m backend_service.app --port 8877`` silently bound 8876.
Now ``main()`` uses argparse with env-var fallbacks:
--port → $CHAOSENGINE_PORT → 8876
--host → $CHAOSENGINE_HOST → 127.0.0.1
CLI > env > default. Surfaces ``--help`` properly (the user can
discover the args). The existing env-var path keeps working for the
Tauri shell + headless test scripts that already set ``CHAOSENGINE_PORT``.
Three new helper scripts under ``scripts/`` for the WSL dev workflow:
- ``install_llama_server_wsl.sh`` — downloads the latest llama.cpp
Linux release into ``~/.chaosengine/bin/`` for the WSL backend.
- ``run_backend_wsl.sh`` — launches the backend on port 8877 with
auth disabled (env: ``CHAOSENGINE_REQUIRE_AUTH=0``), pointing at the
WSL-side llama-server. Detached via nohup + disown.
- ``probe_backend_wsl.sh`` — diagnostic helper; runs the backend
foreground for 3 s and surfaces import / bind errors.
WSL test sweep results (Ubuntu-24.04, RTX 4090, vllm-venv at 0.21.0):
- pytest tests/ — 1472 passed, 21 failed, 21 skipped
(49 more passes than Windows — fewer platform-specific failures)
- e2e_test_suite.py --smoke — 6/0/0 PASS including the two new
FU-056 Phase 9 phase-6 probes (vllm-wsl-status + capability flags)
- cache-strategy-matrix.py --quick — 0/0 ran, 15/15 skipped honestly
(only ``native`` strategy in dev venv; no turbo binary, no dflash,
no models in dev library — all skip reasons accurate)
Two related UX cleanups landed together because they share the same
plumbing pattern (App.tsx → tabs → leaf components):
1. **Hide MTPLX install affordances on Windows / Linux.** The MTPLX
block in RuntimeControls (the launch-settings modal that opens
from Chat / Compare / HTML Challenge / Benchmarks) used to render
the MTPLX checkbox + "Install MTPLX" button + info disclosure on
every host. MTPLX is Apple-Silicon-only — the install would error
on Windows and the checkbox would render disabled with no path to
recovery. Per the FU-034 rule (hide unrecoverable options, don't
grey them out), the whole block is now gated on a new
``isAppleSilicon`` prop threaded from App.tsx via:
App.tsx
→ LaunchModal / CompareView / HtmlChallengeTab /
BenchmarkRunTab
→ ChallengePickerModal (for HtmlChallengeTab)
→ ModelLaunchModal
→ RuntimeControls
Three call sites on RuntimeControls (the MTPLX label, the
info-panel expand, the info button) now ALL gate on the prop.
``dflash-mlx`` was already platform-gated via the FU-056 Phase 2
AcceleratorCard catalog (platformGate: "apple-silicon").
2. **Chat empty-state banner.** Fresh-install users opening the
Chat tab used to see "Send a message to start the conversation."
followed by a silent auto-load of the largest MLX direct variant
(a 15+ GB download that doesn't even work on Windows/Linux —
MLX backend doesn't exist there). Replaced with a
``<ChatEmptyStateBanner>`` that surfaces a clear CTA: "Browse
Discover" when library is empty, "Open Models" when models are
present but none loaded. No silent auto-loads, no confused users
waiting on the wrong download.
The banner is purely additive — composer textarea still usable
above (users can also type + the banner suggests Discover).
Plumbing:
- New ``src/utils/platform.ts`` with ``isAppleSiliconHost``,
``isCudaHost``, ``isIntelMac`` helpers. Reads from
``workspace.system`` (platform + arch) which the backend already
populates from ``platform.system()`` + ``platform.machine()``.
- 15 unit tests in ``src/utils/__tests__/platform.test.ts`` pin
every host-classifier branch (Darwin arm64, aarch64, Intel Mac,
Windows, Linux, null/undefined, case-insensitive).
- ``isAppleSiliconHost(workspace.system)`` computed once at App.tsx
top-level, threaded as ``isAppleSilicon`` prop to the four call
sites that own MTPLX surfaces.
- New ``<ChatEmptyStateBanner>`` component with two states
(no-models / no-loaded-model), each with appropriate CTA.
Tests: 35 files / 424 vitest tests pass (+15 from platform helper).
tsc clean. No new pytest needed — backend unchanged.
Not addressed in this commit (deferred):
- MLX-only image / video catalog variants still surface in Discover
/ Models tabs on Win/Linux. Filtering those is a larger UX call —
hide entirely vs. show with "Apple Silicon only" pill — deserves
its own decision before code.
- "llama-server installed by default" — already the case via
scripts/stage-runtime.mjs for release builds. No code change.
Per FU-034 "hide unrecoverable options" policy, extend it to whole
catalog rows. Windows / Linux users no longer see MLX / mlx-video /
mflux / MTPLX entries they can never run, and Apple Silicon users no
longer see vLLM / nunchaku / CUDA-only entries.
- src/utils/platform.ts: imageOrVideoVariantPlatformGate +
chatVariantPlatformGate + isVariantCompatibleWithHost derive a
PlatformGate ("apple-silicon" | "cuda" | "any") from existing variant
fields (runtime / backend / styleTags / repo prefix). No catalog
schema change required.
- ImageModelsTab / ImageDiscoverTab / VideoModelsTab / VideoDiscoverTab:
new hostSystem prop, filtered through isVariantCompatibleWithHost in
the rows/filteredResults useMemo.
- App.tsx: threaded workspace.system into all four tabs;
libraryChatOptions now also filtered so the launch dropdown drops
MLX backends on Win/Linux.
- AcceleratorsBoostPack: showIncompatible flipped off, the table now
surfaces only accelerators the current host can install.
16 new vitest cases pin the helper boundaries (Apple Silicon host
hides CUDA-only variants, Linux x86_64 hides Apple-Silicon-only
variants, "any" gate passes on every host, etc). All 440 frontend
tests pass; tsc clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.