aivrar / vllm-windows-build Star 19 Code Issues Pull requests Native Windows build of vLLM 0.21.0 — no WSL, no Docker. Pre-built wheels + 36-file Windows patch + 10 KV cache compression dtypes (6 Multi-TurboQuant + 4 upstream TurboQuant). PyTorch 2.11 + CUDA 12.6 + Triton + Flash-Attention 2. windows gpu cuda pytorch nvidia triton msvc quantization kv-cache awq llm llm-serving vllm llm-inference flash-attention qwen kv-cache-compression turboquant vllm-windows multi-turboquant Updated May 19, 2026 Python
palatalised-chancellorsville108 / turboquant-pytorch Star 0 Code Issues Pull requests Accelerate LLM KV cache compression with a PyTorch TurboQuant implementation for efficient, high-quality vector quantization. windows machine-learning deep-learning cpp retrieval gpu cuda transformers pytorch triton attention mlx libtorch kv-cache apple-silicon llm-inference kv-cache-compression vector-compression multi-turboquant Updated May 23, 2026